python - Pythonで2つの異なるタグの間でhtmlを抽出するには?

Question

私は次のhtmlを持っています:

<h2>blah</h2>
html content to extract 
(here can come tags, nested structures too, but no top-level h2)
<h2>other blah</h2>

string.split("<h2>")Pythonで使用せずにコンテンツを抽出できますか?
(たとえば、BeautifulSoup または他のライブラリを使用しますか?)

score 1 · Accepted Answer

http://htql.netの HTQL を使用したテストコードを次に示します。

sample="""<h2>blah</h2>
        html content to extract 
        <div>test</div>
        <h2>other blah<h2>
    """

import htql
htql.query(sample, "<h2 sep excl>2")
# [('\n        html content to extract \n        <div>test</div>\n        ',)]

htql.query(sample, "<h2 sep> {a=<h2>:tx; b=<h2 sep excl>2 | a='blah'} ")
# [('blah', '\n        html content to extract \n        <div>test</div>\n        ')]

python - Pythonで2つの異なるタグの間でhtmlを抽出するには?

3 に答える 3

Related

Reference