php - PHP と正規表現で preg_match_all を使用した URL マッチング

Question

imdb リストから映画の URL を取得するクローラーを構築しようとしています。ページ上のすべてのリンクを配列に入れることができ、「タイトル」が含まれているものだけを選択したいと考えています。

preg_match_all($pattern, "[125] => href=\"/chart/2000s?mode=popular\" [126] => href=\"/title/tt0111161/\" ", $matches);

どこで$pattern='/title/'。

次のエラーが表示されます。

警告: preg_match_all() [function.preg-match-all]: C:\xampp\htdocs\phpProject1\index.php の 53 行目の区切り文字は、英数字またはバックスラッシュであってはなりません

これをどうやって進めるかについて何か考えはありますか？どうもありがとう。

score 1 · Accepted Answer

preg_match_all が呼び出された時点でよろしいですか$pattern?'/title/'

preg_match_all (第 1 引数) に指定されたパターンが適切に区切られていない場合に、エラーが発生します。

score 1 · Accepted Answer

DOM パーサーを使用します。

// Create DOM from URL or file
$html = file_get_html('http://www.example.com/');

// Find all links containing title as part of their HREF 
$links = $html->find('a[href*=title]');

// loop through links and do stuff
foreach($links as $link) { 
       echo $element->href . '<br>';
}

http://simplehtmldom.sourceforge.net/manual.htm

php - PHP と正規表現で preg_match_all を使用した URL マッチング

2 に答える 2

Related

Reference