php - WebページからすべてのFacebookリンクを取得しようとしています

Question

Facebookからのリンクのページをスクレイプしようとしています。ただし、エラーメッセージが表示されずに空白のページが表示されます。

私のコードは次のとおりです。

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $matches = array();
    if (preg_match('~^https?://(?:www\.)?facebook.com/(.+)/?$~', $html, $matches)) {
        print_r($matches);

    }
}

$html = file_get_contents('http://curvywriter.info/contact-me/');

getFacebook($html);

どうしたの？

score 1 · Accepted Answer

より良い代替（そしてより堅牢な）は、DOMDocumentとDOMXPathを使用することです。

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $dom = new DOMDocument;
    @$dom->loadHTML($html);

    $query = new DOMXPath($dom);

    $result = $query->evaluate("(//a|//A)[contains(@href, 'facebook.com')]");

    $return = array();

    foreach ($result as $element) {
        /** @var $element DOMElement */
        $return[] = $element->getAttribute('href');
    }

    return $return;

}

$html = file_get_contents('http://curvywriter.info/contact-me/');

var_dump(getFacebook($html));

しかし、あなたの特定の問題について、私は次のことをしました：

最初の発見後に停止しないように、に変更preg_matchします。preg_match_all
パターンから^（開始）文字と（終了）文字を削除しました。$リンクはドキュメントの最初や最後ではなく、ドキュメントの中央に表示されます（両方ではありません！）

したがって、修正されたコード：

<?php
error_reporting(E_ALL);

function getFacebook($html) {

    $matches = array();
    if (preg_match_all('~https?://(?:www\.)?facebook.com/(.+)/?~', $html, $matches)) {
        print_r($matches);

    }
}

$html = file_get_contents('http://curvywriter.info/contact-me/');

getFacebook($html);

php - WebページからすべてのFacebookリンクを取得しようとしています

1 に答える 1

Related

Reference