c# - 論理 AND を使用した「HTML Agility Pack」XPath クエリ

Question

HTML ドキュメントで、最初の 2 行にテキストが含まれる 3 列を含むテーブルを見つけようとしています。

次のクエリを使用して実験しました。これは、テーブルの最初の 2 行の最初の列にテキストが含まれているノードを返したいと考えています。

string xpath = @"//table//table[//tr[1]//td[1]//*[contains(text(), *)] and //tr[2]//td[1]//*[contains(text(), *)]]";
HtmlNode temp = doc.DocumentNode.SelectSingleNode(xpath);

うまくいかないもん。

これは、私が一致させようとしているテーブルであるサンプル HTML です。

    <table width="100%" cellpadding="0" border="0">
       <tbody>
       <tr>
          <td width="27%" valign="center"><b><font size="1" face="Helvetica">SOME TEXT<br></font></b></td>
          <td width="1%"></td>
          <td width="9%" valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td width="1%"></td>
          <td width="25%" valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td width="37%"></td>
       </tr>
       <tr>
          <td valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td></td>
          <td valign="center"><font size="1" face="Helvetica">1<br></font></td>
          <td></td>
          <td valign="center"><font size="1" face="Helvetica">SOME TEXT<br></font></td>
          <td></td>
       </tr>
       </tbody>
</table>

列 1、3、5 の最初の 2 行にテキストがあることがわかります。それが私が一致させようとしているものです。

score 1 · Accepted Answer

//table//table[//tr[1]//td[1]//*[contains(text(), *)] and //tr[2]//td[1]//*[contains(text(), *)]]

There are many problems with this XPath expression:

//table//table selects any table that is a descendant of a table. However, in the provided XML document there are no nested tables.
table[//tr[1]//td[1]//*[contains(text(), *)] . The //tr inside the predicate is an absolute Xpath expression -- it selects all tr elements in the whole document -- not only in the subtree rooted by this table element. Most probably you want .//tr instead of //tr.
//td[1] selects any td element that is the first td child of its parent -- but most probably you want only the first descendant td element. If so, you need to use this XPath expression: (//td)[1]
//*[contains(text(), *)] this selects any element whose first text node child contains the string value of the first element child -- but you simply want to verify that a td has a descendant text child node -- this can correctly be selected with: td[.//text()]

Combining the corrections of all these issues, what you probably want is something like:

  //table
     [(.//tr)[1]/td[1][.//text()]
    and
      (.//tr)[2]/td[1][.//text()]
     ]

Alternatively, one could write an equivalent but more understandable and less error-prone expression like this:

//table
  [descendant::tr[1]/td[1][descendant::text()]
 and
   descendant::tr[1]/td[1][descendant::text()]
  ]

c# - 論理 AND を使用した「HTML Agility Pack」XPath クエリ

1 に答える 1

Related

Reference