html - XPath: プレーンテキストで HTML 要素を検索する

Question

注意:この質問は、以前の質問のより洗練されたバージョンです。

HTML ドキュメント内の特定のプレーンテキストを含む要素を検索できる XPath を探しています。たとえば、次の HTML があるとします。

<html>
<head>...</head>
<body>
    <someElement>This can be found</someElement>
    <nested>
        <someOtherElement>This can <em>not</em> be found most nested</someOtherElement>
    </nested>
    <yetAnotherElement>This can <em>not</em> be found</yetAnotherElement>
</body>
</html>

テキストで検索する必要があり<someElement>、次の XPath を使用して検索できます。

//*[contains(text(), 'This can be found')]

プレーンテキストを見つけ<someOtherElement>て<yetAnotherElement>使用できる同様の XPath を探しています。以下は機能しません。"This can not be found"

//*[contains(text(), 'This can not be found')]

これはネストされたem要素が「これは見つかりません」というテキストの流れを「乱す」ためだと理解しています。ある意味で、XPaths を介して、上記のようなネストまたは類似のネストを無視することは可能ですか?

score 11 · Accepted Answer

使用できます

//*[contains(., 'This can not be found')]
   [not(.//*[contains(., 'This can not be found')])]

この XPath は、次の 2 つの部分で構成されています。

//*[contains(., 'This can not be found')]: 演算子.は、コンテキストノードをその文字列表現に変換します。したがって、この部分は、文字列表現に 'This can't be found' を含むすべてのノードを選択します。上記の例では、これは<someOtherElement>、<yetAnotherElement> および: <body>および<html>です。
[not(.//*[contains(., 'This can not be found')])]: これにより、「これは見つかりません」というプレーンテキストをまだ含む子要素を持つノードが削除されます。上記の例では、不要なノード<body>を削除します。<html>

これらの XPath はこちらで試すことができます。

html - XPath: プレーン テキストで HTML 要素を検索する

1 に答える 1

Related

Reference

html - XPath: プレーンテキストで HTML 要素を検索する