php - Php正規表現の問題

Question

次のテキストのリンクを含む文を取得しようとしています：

<p> Referencement PG1 est spécialiste en référencement depuis 2004. Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver. Fascinez le regard avec le film vidéo. Vous demeurerez persistant sur les plateformes Youtube, Dailymotion ... Les images Video apparaissant dans les index de Google appâteront les surfeurs. <img style="padding:5px;float:left" src="http://thumbs.virtual-tour.tv/referencementpage1.jpg Par le appel à la Vidéo, faites-vous connaître. </p>

これはこの文を意味します：

Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver.

この正規表現を使用しています：

([A-Z][^<]*)<a[^>]*>([^<]*)</a>([^\.!\?]*)

なぜそれが機能しないのかわかりません、それは私に必要なものでprevisou文を与えています：

Referencement PG1 est spécialiste en référencement depuis 2004. Une recherche sur <a rev="help" dir="rtl" href="http://www.referencement-site-pro.com Mot Clé</a>, aidera de nous trouver.

何ですか-私は行方不明ですか？助けてくれてありがとう=D

編集（いくつかのコード）：

preg_match_all('#([A-Z][^<\.!\?]*)<a[^>]*>([^<]*)</a>(.*[^\.!\?]*)#U', $spinnedText, $matches);
echo "<pre>";
print_r($matches);
echo "</pre>";
foreach($matches[1] as $key=>$value){
//$spinnedText = str_replace($matches[0][$key], "<a {title=\"".$this->url."\"|} {rev=\"{index|help|bookmark|friend}\"|} {dir=\"rtl\"|}{rel=\"{friend|bookmark|help|}\"|} href=\"".$this->url."\">".trim($value)."</a>", $spinnedText);
$spinnedText = str_replace($matches[0][$key], "<a {title=\"".$this->url."\"|} {rev=\"{index|help|bookmark|friend}\"|} {dir=\"rtl\"|}{rel=\"{friend|bookmark|help|}\"|} href=\"".$this->url."\">".$matches[1][$key].$matches[2][$key].$matches[3][$key]."</a>", $spinnedText);
}

score 1 · Accepted Answer

正規表現は大文字で始まるため、最初の文と一致します。\.または何かから始める必要があり(?:^|[\.!?])ますが、最初の文も状況によっては有効である可能性があるため、それはあなたにとって問題になる可能性があります。これらのリンクで複数の文を持つことができる可能性はありますか？重要な質問は、文を定義するものです。

これは、aの後の最初p>の文と文字列の先頭の文に加えて、あなたが持っているもので機能します。

preg_match('/
   (?:           # match, but do not capture any of
   ^             # the start of the string
   |p>\s*        # or an opening or closing p tag followed by any number of spaces
   |[\.!?] )     # or sentence punctuation followed by a space
   (             # capture
   [A-Z]         # a capital letter
   .*?           # followed by any characters until
   <\/a>         # a closing anchor tag
   .*?           # followed by any characters until
   [.?!])        # closing punctuation
/x', $item, $matches);

score 0 · Accepted Answer

これは「欲張りマッチング」と呼ばれます。これは、正規表現エンジンが通常、正規表現が有効なすべての文字と一致することを意味します。あなたの例では、異なる文に貪欲に一致しないように、正規表現の開始を制限する必要があります。

これを試して：

[^.!?]*<\s*a[^>]+>([^<]*)</a>[^.?!]*[.?!]

それは文全体と一致する必要があり、それ以上のものはありません。

お役に立てれば。

score 0 · Accepted Answer

代わりに、DOMパーサーを調べることをお勧めします。

例：http ：//simplehtmldom.sourceforge.net/

彼らのサイトからの例：

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

// Find all images
foreach($html->find('img') as $element)
    echo $element->src . '<br>';

php - Php正規表現の問題

3 に答える 3

Related

Reference