c# - すべて取得
特定の内部からの要素
C#で

Question

いくつかの<div>要素で構成される Web ページがあります。

<div>特定の<h4>ヘッダーの後にすべての li 要素を出力するプログラムを書きたいと思います。誰か助けやサンプルコードを教えてもらえますか?

<div id="content">
    <h4>Header</h4>
    <ul>
        <li><a href...></a> THIS IS WHAT I WANT TO GET</li>
    </ul>
</div>

score 2 · Accepted Answer

C# で HTML を解析する場合は、独自に記述しようとしないでください。HTML Agility Packは、ほぼ間違いなく、あなたが望むことを行うことができます!

一定の部分は次のとおりです。

DIV の「id」?
h4

完全な HTML ドキュメントを検索し、H4 だけに反応するのは混乱する可能性がありますが、DIV に「コンテンツ」の ID があることがわかっている場合は、それを探してください!

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(yourHtml);

if ( doc.DocumentNode != null )
{
   var divs = doc.DocumentNode
                 .SelectNodes("//div")
                 .Where(e => e.Descendants().Any(e => e.Name == "h4"));

   // You now have all of the divs with an 'h4' inside of it.

   // The rest of the element structure, if constant needs to be examined to get
   // the rest of the content you're after.
}

score 0 · Accepted Answer

<li></li>タグの下にあるすべてのタグの間にあり、タグの<div id="content">直後にあるものだけが必要な場合は<h4>、これで十分です。

//Load your document first.
//Load() accepts a Stream, a TextReader, or a string path to the file on your computer
//If the entire document is loaded into a string, then use .LoadHtml() instead.
HtmlDocument mainDoc = new HtmlDocument();
mainDoc.Load("c:\foobar.html");


//Select all the <li> nodes that are inside of an element with the id of "content"
// and come directly after an <h4> tag.
HtmlNodeCollection processMe = mainDoc.GetElementbyId("content")
                                      .SelectNodes("//h4/following-sibling::*[1]//li");

//Iterate through each <li> node and print the inner text to the console
foreach (HtmlNode listElement in processMe)
{
    Console.WriteLine(listElement.InnerText);
}

score 0 · Accepted Answer

Web ページの場合、なぜ HTML 解析を行う必要があるのでしょうか。Web ページを作成するために使用しているテクノロジでは、ページのすべての要素にアクセスできないでしょうか。たとえば、ASP.NET を使用している場合、ID を UL と LI (runat サーバータグを使用) に割り当てることができ、それらはコードビハインドで使用できますか?

あなたが何をしようとしているのか、あなたのシナリオを説明していただけますか? Web リクエストを作成しようとしている場合は、html を文字列としてダウンロードしてから、HTML を破棄するのが理にかなっています

編集これはうまくいくはずだと思う

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

    foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//div"))
    {
        if(p.Attributes["id"].Value == "content")
        {
            foreach(HtmlNode child in p.ChildNodes.SelectNodes("//ul"))
            {
                if(p.PreviousSibling.InnerText() == "Header")
                {
                    foreach(HtmlNode liNodes in p.ChildNodes)
                    {
                        //liNodes represent all childNode
                    }
                }
        }
    }

c# - すべて取得特定の内部からの要素C#で

3 に答える 3

Related

Reference

c# - すべて取得
特定の内部からの要素
C#で