c# - HTML 要素の値を取得する

Question

テキストファイルに Web ページの HTML コードがあります。プログラムがタグにある値を返すようにしたいと思います。たとえば、「ジュリアス」を取り出したい

<span class="hidden first">Julius</span>

これには正規表現が必要ですか? それ以外の場合、それを実行できる文字列関数は何ですか?

score 13 · Accepted Answer

You should be using an html parser like htmlagilitypack .Regex is not a good choice for parsing HTML files as HTML is not strict nor is it regular with its format.

You can use below code to retrieve it using HtmlAgilityPack

HtmlDocument doc = new HtmlDocument();
doc.Load(yourStream);

var itemList = doc.DocumentNode.SelectNodes("//span[@class='hidden first']")//this xpath selects all span tag having its class as hidden first
                  .Select(p => p.InnerText)
                  .ToList();

//itemList now contain all the span tags content having its class as hidden first

score 7 · Accepted Answer

7

Html Agility Packを使用して、C# で HTML を解析します。

于 2012-11-05T14:45:30.830 に答える

score 2 · Accepted Answer

2

HTML Agility Packなどを検討することを強くお勧めします。

于 2012-11-05T14:45:40.480 に答える

score 1 · Accepted Answer

数日前に同じ質問をして、最終的に HTML Agility Pack を使用しましたが、必要な正規表現は次のとおりです。

これは属性を無視します

<span[^>]*>(.*?)</span>

これは属性を考慮します

<span class="hidden first"[^>]*>(.*?)</span>

c# - HTML 要素の値を取得する

4 に答える 4

Related

Reference