python - 複数行のPython正規表現

Question

私はこのように構造化されたファイルを持っています：

A: some text
B: more text
even more text
on several lines
A: and we start again
B: more text
more
multiline text

私はこのように私のファイルを分割する正規表現を見つけようとしています：

>>>re.findall(regex,f.read())
[('some text','more text','even more text\non several lines'),
 ('and we start again','more text', 'more\nmultiline text')]

これまでのところ、私は次のようになりました：

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL)
[(' some text', ' more text', ''), (' and we start again', ' more text', '')]

複数行のテキストはキャッチされません。怠惰な修飾子は本当に怠惰で何もキャッチしないためだと思いますが、私はそれを取り出します、正規表現は本当に貪欲になります：

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL)
[(' some text',
' more text',
'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')]

誰かアイデアがありますか？ありがとう！

score 12 · Accepted Answer

次の行で始まるA:（または文字列の最後で）一致を停止するように正規表現に指示できます。

re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE)

python - 複数行のPython正規表現

1 に答える 1

Related

Reference