python - Pythonを使用して、一連のパターンの正規表現の出現を正しい順序で検索します

Question

他のファイルに抽出したいので、いくつかのパターンの一連のテキストファイルを解析しています。

つまり、ファイルから一致するものを除いてすべてを「削除」したいということです。

たとえば、一致するパターンとしてpattern1、pattern2、pattern3がある場合、次の入力が必要です。

bla bla
pattern1
pattern2
bla bla bla
pattern1
pattern3
bla bla bla
pattern1

次の出力を与えるには：

pattern1
pattern2
pattern1
pattern3
pattern1

どのパターンの一致リストも使用re.findallして正常に取得できますが、各パターンの一致がファイル内に混在していることを考えると、順序を維持する方法を考えることはできません。

読んでくれてありがとう。

score 5 · Accepted Answer

すべてを1つのパターンに結合します。サンプルコードでは、次のパターンを使用します。

^pattern[0-9]+

実際にもっと複雑な場合は、試してみてください

^(aaaaa|bbbbb|ccccc|ddddd)

score 2 · Accepted Answer

これが「これをコピーして実行」形式の回答です。

import re

#lets you add more whenever you want
list_of_regex = [r"aaaa",r"bbbb",r"cccc"]

#hold the completed pattern
pattern_string = r"^("

#combines the patterns
for item in list_of_regex:
    pattern_string += "|".join(list_of_regex)

pattern_string += r")"

#open the file that you are reading
fr = open(FILE_TO_READ)

#holds the read files strings
search_string = fr.read()

#close the file
fr.close()

#open the file you want to write to
fw = open(FILE_TO_WRITE, 'w')

#write the results of findall into the file (as requested)
fw.writelines(re.findall(pattern_string,search_string))

#close the file
fw.close()

python - Pythonを使用して、一連のパターンの正規表現の出現を正しい順序で検索します

2 に答える 2

Related

Reference