python - Sphinx のようなドキュメントを解析する

Question

さらに処理するためにさまざまな部分 (param、return、type、rtype など) を抽出したい Sphinx 形式の docstring があります。どうすればこれを達成できますか？

score 9 · Accepted Answer

Sphinx が構築されているdocutilsを使用できます。この他の回答docutils.core.publish_doctreeでは、reStructuredText ドキュメント (実際にはテキストの文字列) の XML 表現を取得し、xml.minidom メソッドを使用してその XML からフィールドリストを抽出します。別の方法は、xml.etree.ElementTree を使用することです。これは、私の意見では、はるかに使いやすいものです。

ただし、最初に、docutils が次のような reStructuredText のブロックに遭遇するたびに

:param x: Some parameter

結果の XML 表現は次のようになります (非常に冗長です)。

<field_list>
    <field>
        <field_name>
            param x
        </field_name>
        <field_body>
            <paragraph>
                Some parameter
            </paragraph>
        </field_body>
    </field>
</field_list>

次のコードはfield_list、ドキュメント内のすべての要素を取得し、テキストをリスト内の 2 タプルとして配置しfield/field_nameますfield/field_body/paragraph。その後、後処理で希望する方法でこれを操作できます。

from docutils.core import publish_doctree
import xml.etree.ElementTree as etree

source = """Some help text

:param x: some parameter
:type x: and it's type

:return: Some text
:rtype: Return type

Some trailing text. I have no idea if the above is valid Sphinx
documentation!
"""

doctree = publish_doctree(source).asdom()

# Convert to etree.ElementTree since this is easier to work with than
# xml.minidom
doctree = etree.fromstring(doctree.toxml())

# Get all field lists in the document.
field_lists = doctree.findall('field_list')

fields = [f for field_list in field_lists \
    for f in field_list.findall('field')]

field_names = [name.text for field in fields \
    for name in field.findall('field_name')]

field_text = [etree.tostring(element) for field in fields \
    for element in field.findall('field_body')]

print zip(field_names, field_text)

これにより、リストが得られます。

[('param x', '<field_body><paragraph>some parameter</paragraph></field_body>'),
 ('type x', "<field_body><paragraph>and it's type</paragraph></field_body>"), 
 ('return', '<field_body><paragraph>Some text</paragraph></field_body>'), 
 ('rtype', '<field_body><paragraph>Return type</paragraph></field_body>')]

したがって、各タプルの最初の項目はフィールドリスト項目 (つまり:return:、:param x:など) であり、2 番目の項目は対応するテキストです。明らかに、このテキストは最もクリーンな出力ではありませんが、上記のコードは非常に簡単に変更できるため、必要な正確な出力を得るために OP に任せます。

python - Sphinx のようなドキュメントを解析する

1 に答える 1

Related

Reference