python - テキストファイルのキーフレーズの前のすべてを削除するには?

Question

私は 5000 以上のテキストファイル (これも csv 形式) を持っており、それぞれにかなりの数百行が含まれています。

特定のフレーズ「City」より上にあるものはすべて不要で、その下にあるものはすべて必要です。すべてを削除する方法 (python またはバッチ) はありますか?

score 5 · Accepted Answer

私はパイソンが大好きです。しかし、時には、sedあまりにも便利な場合があります:

sed -n '/City/,$p' file_with_city > new_file_with_city_on_first_line

score 2 · Accepted Answer

sed -i -n '/City/,$p' file1 file2 etcPythonのアナログ：

#!/usr/bin/env python
import fileinput

copy = False
for line in fileinput.input(inplace=True): # edit files inplace
    if fileinput.isfirstline() or not copy: # reset `copy` flag for next file
       copy = "City" in line
    if copy:
       print line, # copy line

使用法：

$ ./remove-before-city.py file1 file2 etc

このソリューションは、コマンドラインインプレースで指定されたファイルを変更します。

score 2 · Accepted Answer

1 つのアルゴリズムは次のとおりです。

テキスト「City」に遭遇するまでファイルから読み取ります
2 番目のファイルを書き込みモードで開く
最初のファイルから 2 番目のファイルにストリーミングする
両方のファイルを閉じる
2 番目のファイルを最初のファイルが占めていた場所に移動します。

ファイルを切り捨てて特定の時点以降の内容を削除することはできますが、特定の時点より前の内容でその場でサイズ変更することはできません。1 つのファイルを繰り返しシークすることでこれを行うことができますが、おそらく価値はありません。

ファイルが十分に小さい場合は、最初のファイル全体をメモリに読み込んでから、必要な部分を同じディスク上のファイルに書き戻すことができます。

score 1 · Accepted Answer

# Use a context manager to make sure the files are properly closed.
with open('in.csv', 'r') as infile, open('out.csv', 'w') as outfile:
    # Read the file line by line...
    for line in infile:
        # until we have a match.
        if "City" in line:
            # Write the line containing "City" to the output.
            # Comment this line out if you don't want to include it.
            outfile.write(line)

            # Read the rest of the input in one go and write it
            # to the output. If you file is really big you might
            # run out of memory doing this and have to break it
            # into chunks.
            outfile.write(infile.read())

            # Our work here is done, quit the loop.
            break

score 0 · Accepted Answer

def removeContent(file, word, n=1, removeword=False):
    with open(fname, "r") as file:
        if removeword:
            content = ''.join(file.read().split(word, n)[n])
        else:
            content = word + ''.join(file.read().split(word, n)[n])
    with open(fname, "w") as file:
        file.write(content)

for fname in filenames:
    removeContent(fname)

パラメータの説明:

n削除に使用する単語の出現箇所を指定します。デフォルトn = 1では、最初に発生する前のすべてが削除されます。5 番目より前のすべてを削除するcityには、関数をで呼び出しますremoveContent(fname, "city", 5)。

file明らかに編集したいファイルの名前を表します

word削除するために使用したい単語です。あなたの場合は次のようになりますcity

removeword単語を保持してその前のテキストのみを削除するか、単語自体も削除するかを指定します。

score 0 · Accepted Answer

import os

for file in os.listdir("."):
    infile = open(file, 'rb')
    line = infile.readline()
    # Sequential read is easy on memory if the file is huge.
    while line != '' and not 'City' in line:
        line = infile.readline()     # skip all lines till 'City' line
    # Process the rest of the file after 'City'
    if 'City' in line:
        print line     # prints to stdout (or redirect to outfile)
    while line != '' :
        line = infile.readline()
        print line

python - テキスト ファイルのキー フレーズの前のすべてを削除するには?

6 に答える 6

Related

Reference

python - テキストファイルのキーフレーズの前のすべてを削除するには?