python - Pythonでlxmlの要素の固定（または可変）数を指定する方法はありますか？

Question

これを行うためのより簡単な方法があるはずです。多数のhtmlドキュメントからのテキストが必要です。私のテストでは、それを見つける最も信頼できる方法は、div要素のtext_contentで特定の単語を探すことです。テキストがある要素の上の特定の要素を調べたい場合は、div要素のリストを列挙し、テキストがある要素のインデックスを使用して、インデックスに基づいて前の要素を指定します。しかし、もっと良い方法があるはずだと私は確信しています。私はそれを理解できないようです。

明確でない場合

for pair in enumerate(list_of_elements):
    if 'the string' in pair[1].text_content():
        thelocation=pair[0]

the_other_text=list_of_elements[thelocation-9].text_content()

また

theitem.getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().getprevious().text_content()

score 3 · Accepted Answer

lxmlは XPathをサポートしています:

from lxml import etree
root = etree.fromstring("...your xml...")

el, = root.xpath("//div[text() = 'the string']/preceding-sibling::*[9]")

score 1 · Accepted Answer

これでうまくいきますか？

from itertools import islice
ancestor = islice(theitem.iterancestors(), 4) # To get the fourth ancestor

編集私はばかです、それはうまくいきません。次のようにヘルパー関数でラップする必要があります。

def nthparent(element, n):
    parent = islice(element.iterancestors(), n, n+1)
    return parent[0] if parent else None

ancestor = nthparent(theitem, 4) # to get the 4th parent

score 0 · Accepted Answer

0

simplehtmldomのようなものを使用してから、インデックスを提供しますか？

于 2010-03-02T21:43:47.283 に答える

python - Pythonでlxmlの要素の固定（または可変）数を指定する方法はありますか？

3 に答える 3

Related

Reference