php - PHP シンプルな HTML DOM パーサー: 親 div の内容を取得する方法

翻译自：https://stackoverflow.com/questions/26288870 2014-10-09T22:18:33.067

5650 次

ページ上のテキストのメインコンテンツ/本文を取得することを目的として、さまざまな (ニュース) サイトを (PHP の単純な HTML DOM を使用して) スクレイピングしています。

これを行うには、メインのヘッダー/見出し (H1) を見つけて、このヘッダータグと同じ div 内に含まれるテキストを取得するのが最善の方法でした。

以下の両方の例で、(親?) div 全体の内容を取得するにはどうすればよいでしょうか。

<div>  <----- need to get contents of this whole div (containing the h1 and likely the main body of text)
  <h1></h1>
  main body of text here
</div>

Div はツリーのさらに上にある可能性があります。

<div> <----- need to get contents of this whole div
  <div>   
    <h1></h1>
  </div>

  <div>
    main body of text here
  </div>
</div>

ツリーをさらに上に分割します。

<div> <----- need to get contents of this whole div
  <div>

    <div>   
      <h1></h1>
    </div>

    <div>
      main body of text here
    </div>

  </div>
</div>

次に、それぞれのサイズを比較して、メインの div を決定できます。

php - PHP シンプルな HTML DOM パーサー: 親 div の内容を取得する方法

2 に答える 2

Related

Reference