python - XPath: プレーンテキストで HTML 要素を検索

Question

注:この質問のより洗練されたバージョンと適切な回答は、こちらにあります。

Selenium Python バインディングを使用して、Web ページ上の特定のテキストを持つ要素を見つけたいと考えています。たとえば、次の HTML があるとします。

<html>
    <head>...</head>
    <body>
        <someElement>This can be found</someElement>
        <someOtherElement>This can <em>not</em> be found</someOtherElement>
    </body>
</html>

テキストで検索する必要があり<someElement>、次の XPath を使用して検索できます。

//*[contains(text(), 'This can be found')]

プレーンテキスト<someOtherElement>を使用して検索できる同様の XPath を探しています。以下は機能しません。"This can not be found"

//*[contains(text(), 'This can not be found')]

これはネストされたem要素が「これは見つかりません」というテキストの流れを「乱す」ためだと理解しています。ある意味で、XPaths を介して、上記のようなネストまたは類似のネストを無視することは可能ですか?

score 18 · Accepted Answer

使用できます//*[contains(., 'This can not be found')]。

コンテキストノード.は、'This can't be found' と比較する前に文字列表現に変換されます。

ただし、を使用しているため、この文字列を含むすべてのエングロビング要素//*に一致することに注意してください。

あなたの例では、それは一致します：

<someOtherElement>
と<body>
そして<html>！

ドキュメント内の特定の要素タグまたは特定のセクション (<table>または<div>既知の ID またはクラス)をターゲットにすることで、これを制限できます。

テキスト条件に一致する最もネストされた要素を見つける方法に関するOPの質問をコメントで編集します。

ここで受け入れられた答え//*[count(ancestor::*) = max(//*/count(ancestor::*))]は、最もネストされた要素を選択することを示唆しています。XPath 2.0だけだと思います。

部分文字列条件と組み合わせると、このドキュメントでここでテストできました

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <someOtherElement>This can <em>not</em> be found</someOtherElement>
</body>
</html>

そして、この XPath 2.0 式で

//*[contains(., 'This can not be found')]
   [count(ancestor::*) = max(//*/count(./*[contains(., 'This can not be found')]/ancestor::*))]

そして、「これは最もネストされていない」を含む要素に一致します。

おそらく、それを行うためのよりエレガントな方法があります。

python - XPath: *プレーン* テキストで HTML 要素を検索

1 に答える 1

Related

Reference

python - XPath: プレーンテキストで HTML 要素を検索