c# - 正規表現検索を使用して、RssでHTMLタグを解析およびフォーマットします

Question

私はRSSから以下のようなcontent：encodedテキストを持っています：

<content:encoded><![CDATA[<P><B>Wednesday, September 26, 2012</B></P>It is Apple.<P>Shops are closed.<br />Parking is not allowed here. Go left and park.<br />All theatres are opened.<br /></P><P><B>Thursday, September 27, 2012</B></P><P>Shops are open.<br />Parking is not allowed here. Go left and park.<br  />All theatres are opened.<br /></P>]]></content:encoded>

以下の方法を使用して、HTMLからテキストを抽出できます。

public static string StripHTML(this string htmlText)
    {
        var reg = new Regex("<[^>]+>", RegexOptions.IgnoreCase);
        return HttpUtility.HtmlDecode(reg.Replace(htmlText, string.Empty));
    }

ただし、内のテキストを<b></b>dateArray []に挿入し、内のテキスト<p></p>をdescriptionArray []に挿入して、次のように表示できるようにします。ここに画像の説明を入力してください

前もって感謝します。

score 0 · Accepted Answer

//http://htmlagilitypack.codeplex.com/
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

var result = doc.DocumentNode.Descendants()
                .Where(n => n is HtmlAgilityPack.HtmlTextNode)
                .Select(n=>new {
                    IsDate = n.ParentNode.Name=="b" ? true: false,
                    Text = n.InnerText,
                })
                .ToList();

c# - 正規表現検索を使用して、RssでHTMLタグを解析およびフォーマットします

1 に答える 1

Related

Reference