c# - 正規表現のエラー入力文字列が正しい形式ではありませんでした

Question

こんにちは皆さん、私は文字列を持っています。私はУчастники 59728 それを解析します

string population = Regex.Match(content, @"Участники&nbsp;<span class=""clgry"">(?<id>[^""]+?)</span>").Groups["id"].Value;
int j = 0;
if (!string.IsNullOrEmpty(population))
{
    log("[+] Группа: " + group + " Учасники: " + population + "\r\n");
    int population_int = Convert.ToInt32(population);
    if (population_int > 20000)
    {
        lock (accslocker)
        {
        StreamWriter file = new StreamWriter("opened.txt", true);
        file.Write(group + ":" + population + "\r\n");
        file.Close();
    }
    j++;
}

}

しかし、私の文字列が>Участники 「入力文字列が正しい形式ではありませんでした」という例外を受け取ります。それを避ける方法は？

score 2 · Accepted Answer

正規表現の代わりに、実際の html パーサーを使用して html を解析します。(例: HtmlAgilityPack )

string html = @"<span class=""lnk"">Участники&nbsp;<span class=""clgry"">59728</span>";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var list = doc.DocumentNode.SelectNodes("//span[@class='lnk']/span[@class='clgry']")
              .Select(x => new
              {
                  ParentText = x.ParentNode.FirstChild.InnerText,
                  Text = x.InnerText
              })
              .ToList();

score 1 · Accepted Answer

正規表現を使用して html コンテンツを解析しようとするのは、適切な決定ではありません。これを参照してください。代わりにHtml Agliy Packを使用してください。

var spans = doc.DocumentNode.Descendants("span")
               .Where(s => s.Attributes["class"].Value == "clgry")
               .Select(x => x.InnerText)
               .ToList();

c# - 正規表現のエラー入力文字列が正しい形式ではありませんでした

2 に答える 2

Related

Reference