python - テキストドキュメントを解析する正規表現

Question

!if と !endif の間にテキストドキュメントを解析しようとしました。!if、!endif、およびそれらの間のテキストを含まないテキストが必要です。

例えば：

text
!if
text1
!endif
text2

私は私の出力をしたいと思います= text + text2 + ..

私はこの re.findall(r'((^(!if.*!endif))+', text). のようなものを試しましたが、うまくいかないようです。

score 4 · Accepted Answer

あなたの正規表現は次のようになります。

^!if$.*?^!endif$\s+

これは言う:

^      - Match the beginning of a line (because of the re.M flag)
!if    - Match !
$      - Match the end of a line (because of the re.M flag)
.*?    - Match any number of characters (non-greedy) (includes line breaks, because of the re.S flag)
^      - Match the beginning of a line (because of the re.M flag)
!endif - Match !endif
$      - Match the end of a line (because of the re.M flag)
\s+    - Match one or more whitespace characters

したがって、上記の正規表現のすべての出現箇所を空の文字列 (何もない) に置き換える、次のように使用できるはずです。

import re
s = "text\n!if\ntext1\n!endif\ntext2"
s = re.sub("^!if$.*?^!endif$\s+", "", s, flags=re.S | re.M)
print s

これは出力します：

text 
text2

これは明示的に必要!ifで!endifあり、別の行にあることに注意してください。これが必要でない場合は、正規表現の途中から$と^アンカーを削除できます。

^!if.*?!endif$\s+

score 0 · Accepted Answer

私はsedで助けることができます：

sed '/^if$/,/^endif$/ d'

sed が使用するアルゴリズムは次のとおりです。

変数 match=False を設定します
次の行を読む
行が「if」に等しいかどうかを確認します。その場合は、変数 match=True を設定します
match==True の場合、current-line=='endif' かどうかを確認します。その場合は、 match=False を設定し、現在の行を削除 [そして 0 にジャンプ] します。
現在の行を印刷する
EOF でない場合、1 にジャンプします

python - テキスト ドキュメントを解析する正規表現

2 に答える 2

Related

Reference

python - テキストドキュメントを解析する正規表現