c# - ページからHAP（HTML Agility Pack）を使用してデータを取得する

Question

この投稿の続きとして、HTMLページからいくつかのデータを解析しようとしています。HTMLは次のとおりです（ページには詳細がありますが、これは重要なセクションです）。

<table class="integrationteamstats">
<tbody>
<tr>
    <td class="right">
        <span class="mediumtextBlack">Queue:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0</span>
    </td>
    <td class="right">
        <span class="mediumtextBlack">Aban:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0%</span>
    </td>
    <td class="right">
        <span class="mediumtextBlack">Staffed:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0</span>
    </td>
</tr>
<tr>
    <td class="right">
        <span class="mediumtextBlack">Wait:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0:00</span>
    </td>
    <td class="right">
        <span class="mediumtextBlack">Total:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0</span>
    </td>
    <td class="right">
        <span class="mediumtextBlack">On ACD:</span>
    </td>
    <td class="left">
        <span class="mediumtextBlack">0</span>
    </td>
</tr>
</tbody>
</table>

2つの情報を取得する必要があります。キューの下のtd内のデータと、待機の下のtd内のデータです（つまり、キューの数と待機時間）。明らかに、数字は頻繁に更新されます。

HTMLがHtmlDocument変数に組み込まれるようになりました。そして、HtmlNodeCollectionを使用して、特定の条件を満たすノードを収集するという方針に沿って何かを見つけました。これは基本的に私が立ち往生しているところです：

HtmlNodeCollection tds = 
    new HtmlNodeCollection(this.html.DocumentNode.ParentNode);
tds = this.html.DocumentNode.SelectNodes("//td");

foreach (HtmlNode td in tds)
{
    /* I want to write:
     * If the last node's value was 'Queue', give me the value of this node.
     * and
     * If the last node's value was 'Wait Time', give me the value of this node.
     */
}

そして、私はこれを行うことができforeachますが、値にアクセスする方法や次の値を取得する方法がわかりません。

score 3 · Accepted Answer

通常、foreach目的の情報を取得するのは非常に簡単なので、a を使用する必要はありません (aforeachを使用すると、ループの各反復の状態を管理する必要があり、非常に扱いにくくなります)。

まず、テーブルを取得します。classクラスが適用された HTML ドキュメントに複数の要素を含めることができるため、属性でのフィルタリングは一般的にはお勧めできません。属性があればid、それが理想的です。

つまり、これがこのクラスを持つ唯一のテーブルである場合、次をtable使用して要素の本体を取得できます。

// Get the table.
HtmlNode tableBody = document.DocumentNode.SelectSingleNode(
    "//table[@class='integrationteamstats']/tbody");

そこから、個々の行を取得します。これらは要素の直接の子であるため、次のように、プロパティtbodyを介して位置によって行を取得できます。ChildNodes

HtmlNode queueRow = tableBody.ChildNodes[0];
HtmlNode waitRow = tableBody.ChildNodes[1];

td次に、各行の 2 番目の要素が必要です。コンテンツをラップするspanタグがそこにありますが、要素内のすべてのテキスト全体が必要な場合は、プロパティを使用して値を取得できます。tdInnerText

string queueValue = queueRow.ChildNodes[1].InnerText;
string waitValue = waitRow.ChildNodes[1].InnerText;

ここにはレプリケーションがあることに注意してください。したがって、このように解析する必要がある行が多数あることがわかった場合は、ロジックの一部をヘルパーメソッドに分解することをお勧めします。

score 1 · Accepted Answer

これを行うためにCsQueryを使用することもできます。おなじみのCSSセレクター構文とjQueryメソッドを使用しているため、より複雑なDOMナビゲーションにはHAPよりも簡単に使用できます。例えば：

// function to get the text from the cell AFTER the one containing 'text'

string getNextCellText(CQ dom, string text) {
    // find the target cell
    CQ target= dom.Select(".integrationteamstats td:contains(" + text + ")");

    // return the text contents of the next cell
    return target.Next().Text();
}

void Main() {
    var dom = CQ.Create(html);
    string queue = getNextCellText(dom,"Queue");
    string wait = getNextCellText(dom,"Wait:");

    .. do stuff
}

c# - ページからHAP（HTML Agility Pack）を使用してデータを取得する

2 に答える 2

Related

Reference