html-agility-pack - HtmlAgilityPack Div クラスに文字列が含まれています

Question

Web ページから記事のテキストのみをスクレイピングしようとしています。記事は常に div タグで囲まれていることがわかりました。残念ながら、これらの div タグのクラスは Web ページごとにわずかに異なります。XPathの使用を検討しましたが、クラス名が異なるため機能しないと思います。すべての div タグを取得してからクラスを取得する方法はありますか?

例

<div class="entry_single">
  <p>I recently traveled without my notebook for the first time in ages.</p>
</div>

<div class="entry-content-pagination">
  <p>Ward 9 Ald. Steven Dove</p>
</div>

score 0 · Accepted Answer

Linq を使用すると、より簡単になります。

foreach(HtmlNode div in doc.DocumentNode.Descendants("div"))
{
    string className = div.GetAttributeValue("class", string.Empty);
    // do something with class name
}

html-agility-pack - HtmlAgilityPack Div クラスに文字列が含まれています

1 に答える 1

Related

Reference