python - 複数行に一致する Python 正規表現 (re.DOTALL)

Question

複数行の文字列を解析しようとしています。

次のように仮定します。

text = '''
Section1
stuff belonging to section1
stuff belonging to section1
stuff belonging to section1
Section2
stuff belonging to section2
stuff belonging to section2
stuff belonging to section2
'''

re モジュールの finditer メソッドを使用して、次のような辞書を取得したいと考えています。

{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}

私は次のことを試しました：

import re
re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+)", re.DOTALL)
sections_it = re_sections.finditer(text)

for m in sections_it:
    print m.groupdict()

しかし、これは次の結果になります。

{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to    section1\nstuff belonging to section1\nSection2\nstuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2\n'}

したがって、section_data も Section2 と一致します。

また、最初のグループ以外のすべてに一致するように 2 番目のグループに指示しようとしました。しかし、これはまったく出力につながりません。

re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>^(?P=section))", re.DOTALL)

次の re を使用できることはわかっていますが、2 番目のグループがどのように見えるかを説明する必要がないバージョンを探しています。

re_sections=re.compile(r"(?P<section>Section\d)\s+(?P<section_data>[a-z12\s]+)", re.DOTALL)

どうもありがとうございました！

score 1 · Accepted Answer

先読みを使用して、次のセクションヘッダーまたは文字列の末尾まですべてを照合します。

re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL)

これには貪欲.+?でないものも必要であることに注意してください。そうしないと、最初に最後まで一致します。

デモ：

>>> re_sections=re.compile(r"(?P<section>Section\d)\s*(?P<section_data>.+?)(?=(?:Section\d|$))", re.DOTALL)
>>> for m in re_sections.finditer(text): print m.groupdict()
... 
{'section': 'Section1', 'section_data': 'stuff belonging to section1\nstuff belonging to section1\nstuff belonging to section1\n'}
{'section': 'Section2', 'section_data': 'stuff belonging to section2\nstuff belonging to section2\nstuff belonging to section2'}

python - 複数行に一致する Python 正規表現 (re.DOTALL)

1 に答える 1

Related

Reference