php - 正規表現を使用してウィキペディアのリストと説明を解析する

Question

正規表現にあまり詳しくないので、ウィキペディアの項目のリストを解析する方法を見つける必要があります。ウィキペディアの api.php を使用してコンテンツを取得したところ、次のようなデータが残っています。

    ==Formal fallacies==
    A [[formal fallacy]] is an error in logic that...

    * [[Appeal to probability]] –  takes something for granted because...
    * [[Argument from fallacy]] –  assumes that if an argument ...
    * [[Base rate fallacy]] –  making a probability judgement...
    * [[Conjunction fallacy]] –  assumption that an outcome simultaneously...
    * [[Masked man fallacy]] –  ...

    ===Propositional fallacies===

    * [[Affirming a disjunct]] –  concluded that ...
    * [[Affirming the consequent]] –  the [[antecedent...
    * [[Denying the antecedent]] –  the [[consequent]] in...

したがって、次のようにデータをプルする方法が必要です。

* [[ で始まる行だけに注意を払っています。
* [[ ]] の間は名前です
- の後の残りの内容は説明です

score 1 · Accepted Answer

これは仕事をします：

preg_match_all('~^\h*+\*\h*\[\[(?<name>[a-z ]++)]]\h*+[-–]\h*+(?<description>.++)$~imu', $text, $results, PREG_SET_ORDER);
foreach($results as &$result) { 
    foreach($result as $key=>$value) {
        if (is_numeric($key)) unset($result[$key]); }
}
echo '<pre>' . print_r($results, true) . '</pre>';

score 0 · Accepted Answer

最初の交換

^((?!\*\s\[\[).)*$

ブランク付き。これにより、* [[ を含まない行が削除されます。

改行を削除

^\n|\r$

ブランク付き。

タイトルと説明を取得する正規表現は次のとおりです。

^\s+\*\s\[\[([^\]\]]*)\]\]\s–(.*)
Title: "$1", Description: "$2"

php - 正規表現を使用してウィキペディアのリストと説明を解析する

2 に答える 2

Related

Reference