python - Python で HTML テキストの目次を生成するにはどうすればよいですか?

Question

次のような HTML コードがあるとします (Markdown や Textile などから生成されます)。

<h1>A header</h1>
<p>Foo</p>
<h2>Another header</h2>
<p>More content</p>
<h2>Different header</h2>
<h1>Another toplevel header
<!-- and so on -->

Python を使用して目次を生成するにはどうすればよいですか?

score 6 · Accepted Answer

lxmlやBeautifulSoupなどの HTML パーサーを使用して、すべてのヘッダー要素を検索します。

score 3 · Accepted Answer

lxml と xpath を使用した例を次に示します。

from lxml import etree
doc = etree.parse("test.xml")
for node in doc.xpath('//h1|//h2|//h3|//h4|//h5'):
    print node.tag, node.text

2 に答える 2