c# - HTMLアジリティパックの末尾の空白のステートメントまたは削除が好きですか?

Question

Web サイトからデータテーブルにデータをダウンロードしようとしています。問題は、空白スペースがあるように見えるため、正しいノードにアクセスできないことです。これまでの私のコードは次のとおりです。

        public static DataTable downloadtable()
    {
        DataTable dt = new DataTable();
        string htmlCode = "";
        using (WebClient client = new WebClient())
        {
            client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
            htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
        }
        //this is just to check the file structure from text file
        System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
        file.WriteLine(htmlCode);

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

        doc.LoadHtml(htmlCode);

        dt = new DataTable();

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@class='list electricity']/tr/th[@class='title'][.='Market Area']"))
        {
            //This is the problem name where I get the error
            foreach (HtmlNode row in table.SelectNodes("//td[@class='title'][.='            00-01          ']"))
            {

                        foreach (var cell in row.SelectNodes("//td"))
                        {
                                //this is to check for correct result, final result would be to dump it into datatable
                                Console.WriteLine(cell.InnerText);                             
                        }
            }
        }
        return dt;
    }

コード内のリンクから時間料金をダウンロードしようとしていますが、末尾の空白が原因で失敗しているようです (と思います)。ノードの名前に like ステートメントはありますか? または、末尾の空白を削除できますか?

score 1 · Accepted Answer

あなたの問題は、明らかにそれ以上のを持たないノードtd内からを取得しようとしていることにあると思います。tdtd

<tr>
 <td class="title">         00-01           </td>
 <td class="spacer"></td>
 <td class="r">€/MWh</td>
 <td class="spacer"></td>
 <td>35.34</td>
 <td class="spacer"></td>
 <td>34.02</td>
 <td class="spacer"></td>
 <td>34.02</td>
</tr>

したがって、結果を反復しようとすると、そのtable.SelectNodes("//td[@class='title'][.=' 00-01 ']")中に td は含まれません。

00-01 から始まるすべての行が必要な場合は、次の行を使用できます。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
    foreach (var cell in row.SelectNodes("./tr/td"))
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

00-01 行のみが必要な場合は、次の行を使用できます。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[@class='title']"))
{
    if (row.InnerText.Trim() == "00-01")
    {
        foreach (var cell in row.ParentNode.ChildNodes)
        {
            if (string.IsNullOrEmpty(cell.InnerText.Trim()))
                continue;
            Console.WriteLine(cell.InnerText.Trim());
        }
    }
}

または、次のように使用できます。

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]"))
{
    foreach (var cell in row.ParentNode.ChildNodes)
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

c# - HTMLアジリティパックの末尾の空白のステートメントまたは削除が好きですか?

1 に答える 1

Related

Reference