python - string と string の前のすべての行をファイルから削除する

Question

何千行ものデータを含むファイル名があります。ファイル名を読み込んで編集しています。

次のタグは約 900 行以上です (ファイルごとに異なります)。

<Report name="test" xmlns:cm="http://www.example.org/cm">

いくつかのファイルでその行とその前のすべてを削除する必要があります。したがって、そのタグを検索して削除するコードが必要であり、その上のすべてが常に900行下になるとは限らず、変化します。ただし、タグは常に同じです。

行を読み込んでファイルに書き込むコードは既にあります。その行を見つけてそれとその前のすべてを削除する背後にあるロジックが必要なだけです。

ファイルを1行ずつ読み取ってから、その文字列にヒットしたら新しいファイルに書き込もうとしましたが、ロジックが正しくありません:

readFile = open(firstFile)
lines = readFile.readlines()
readFile.close()
w = open('test','w')
for item in lines:
    if (item == "<Report name="test" xmlns:cm="http://www.example.org/cm">"):
        w.writelines(item)
w.close()

さらに、正確な文字列は各ファイルで同じではありません。値「テスト」は異なります。タグ名 ""<Report name" を確認する必要があるかもしれません

score 3 · Accepted Answer

次のようなフラグを使用してtag_found、行をいつ出力に書き込む必要があるかを確認できます。最初にフラグをに設定し、適切なタグが見つかったらFalseに変更します。TrueフラグがTrueの場合、行を出力ファイルにコピーします。

TAG = '<Report name="test" xmlns:cm="http://www.domain.org/cm">'

tag_found = False
with open('tag_input.txt') as in_file:
    with open('tag_output.txt', 'w') as out_file:
        for line in in_file:
            if not tag_found:
                if line.strip() == TAG:
                    tag_found = True
            else:
                out_file.write(line)

PS:with open(filename) as in_file:構文は、Python が「コンテキストマネージャー」と呼ぶものを使用しています。概要については、こちらを参照してください。それらの簡単な説明は、ブロックが終了したときにファイルを安全に閉じることを自動的に処理するため、ステートメントwith:を入れることを覚えておく必要がないということです。my_file.close()

score 0 · Accepted Answer

正規表現を使用して、次の行に一致させることができます。

regex1 = '^<Report name=.*xmlns:cm="http://www.domain.org/cm">$'

正規表現に一致するアイテムのインデックスを取得します。

listIndex = [i for i, item in enumerate(lines) if re.search(regex, item)]

リストをスライスします。

listLines = lines[listIndex:]

そして、ファイルに書き込みます。

with open("filename.txt", "w") as fileOutput:
    fileOutput.write("\n".join(listLines))

擬似コード

次のようなものを試してください。

import re

regex1 = '^<Report name=.*xmlns:cm="http://www.domain.org/cm">$' # Variable @name
regex2 = '^<Report name=.*xmlns:cm=.*>$' # Variable @name & @xmlns:cm

with open(firstFile, "r") as fileInput:
    listLines = fileInput.readlines()

listIndex = [i for i, item in enumerate(listLines) if re.search(regex1, item)]
# listIndex = [i for i, item in enumerate(listLines) if re.search(regex2, item)] # Uncomment for variable @name & @xmlns:cm

with open("out_" + firstFile, "w") as fileOutput:
    fileOutput.write("\n".join(lines[listIndex:]))

python - string と string の前のすべての行をファイルから削除する

2 に答える 2

Related

Reference