python - Pythonでテキストファイルを連結するにはどうすればよいですか？

Question

のような20個のファイル名のリストがあります['file1.txt', 'file2.txt', ...]。これらのファイルを新しいファイルに連結するPythonスクリプトを作成したいと思います。各ファイルをで開きf = open(...)、を呼び出して1行ずつ読み取り、f.readline()各行をその新しいファイルに書き込むことができます。それは私にはあまり「エレガント」に見えません、特に私が行ごとに読み書きしなければならない部分。

Pythonでこれを行うためのより「エレガントな」方法はありますか？

score 301 · Accepted Answer

これはそれを行う必要があります

大きなファイルの場合：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            for line in infile:
                outfile.write(line)

小さなファイルの場合：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

…そして私が考えたもう一つの興味深いもの：

filenames = ['file1.txt', 'file2.txt', ...]
with open('path/to/output/file', 'w') as outfile:
    for line in itertools.chain.from_iterable(itertools.imap(open, filnames)):
        outfile.write(line)

悲しいことに、この最後の方法では、いくつかの開いているファイル記述子が残ります。これは、GCがとにかく処理する必要があります。面白いと思っただけです

score 236 · Accepted Answer

を使用しshutil.copyfileobjます。

入力ファイルをチャンクごとに自動的に読み取ります。これは、より効率的で入力ファイルの読み取りであり、一部の入力ファイルが大きすぎてメモリに収まらない場合でも機能します。

import shutil

with open('output_file.txt','wb') as wfd:
    for f in ['seg1.txt','seg2.txt','seg3.txt']:
        with open(f,'rb') as fd:
            shutil.copyfileobj(fd, wfd)

score 65 · Accepted Answer

それがまさにfileinputの目的です。

import fileinput
with open(outfilename, 'w') as fout, fileinput.input(filenames) as fin:
    for line in fin:
        fout.write(line)

このユースケースでは、ファイルを手動で反復処理するよりも実際にはそれほど単純ではありませんが、他の場合には、すべてのファイルを単一のファイルであるかのように反復処理する単一のイテレータを使用すると非常に便利です。（また、各ファイルが完了するとすぐに閉じるという事実は、fileinput各ファイルを閉じる必要がないことを意味しますがwith、closeそれは1行の節約であり、それほど大きな問題ではありません。）

fileinputには、各行をフィルタリングするだけでファイルをインプレースで変更する機能など、他にも便利な機能がいくつかあります。

コメントに記載されているように、また別の投稿で説明されているように、Python2.7の場合は示されfileinputているように機能しません。ここでは、コードをPython2.7に準拠させるためのわずかな変更を加えています

with open('outfilename', 'w') as fout:
    fin = fileinput.input(filenames)
    for line in fin:
        fout.write(line)
    fin.close()

score 8 · Accepted Answer

エレガンスについてはわかりませんが、これは機能します。

    import glob
    import os
    for f in glob.glob("file*.txt"):
         os.system("cat "+f+" >> OutFile.txt")

score 6 · Accepted Answer

UNIXコマンドの何が問題になっていますか？（Windowsで作業していない場合）：

ls | xargs cat | tee output.txt仕事をします（必要に応じて、サブプロセスを使用してPythonから呼び出すことができます）

score 5 · Accepted Answer

outfile.write(infile.read()) # time: 2.1085190773010254s
shutil.copyfileobj(fd, wfd, 1024*1024*10) # time: 0.60599684715271s

単純なベンチマークは、shutilのパフォーマンスが優れていることを示しています。

score 3 · Accepted Answer

@ inspectorG4dget回答の代替（2016年3月29日までのベストアンサー）。436MBの3つのファイルでテストしました。

@ inspectorG4dgetソリューション：162秒

次の解決策：125秒

from subprocess import Popen
filenames = ['file1.txt', 'file2.txt', 'file3.txt']
fbatch = open('batch.bat','w')
str ="type "
for f in filenames:
    str+= f + " "
fbatch.write(str + " > file4results.txt")
fbatch.close()
p = Popen("batch.bat", cwd=r"Drive:\Path\to\folder")
stdout, stderr = p.communicate()

アイデアは、「古い優れた技術」を利用して、バッチファイルを作成して実行することです。そのセミPythonですが、より高速に動作します。Windowsで動作します。

score 3 · Accepted Answer

ディレクトリに多数のファイルがある場合は、glob2ファイル名を手動で書き込むよりも、ファイル名のリストを生成する方がよい場合があります。

import glob2

filenames = glob2.glob('*.txt')  # list of all .txt files in the directory

with open('outfile.txt', 'w') as f:
    for file in filenames:
        with open(file) as infile:
            f.write(infile.read()+'\n')

score 2 · Accepted Answer

Fileオブジェクトの.read（）メソッドを確認してください。

http://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects

あなたは次のようなことをすることができます：

concat = ""
for file in files:
    concat += open(file).read()

またはより「エレガントな」python-way：

concat = ''.join([open(f).read() for f in files])

この記事によると、http ：//www.skymind.com/~ocrow/python_string/も最速です。

score 2 · Accepted Answer

ファイルが巨大でない場合：

with open('newfile.txt','wb') as newf:
    for filename in list_of_files:
        with open(filename,'rb') as hf:
            newf.write(hf.read())
            # newf.write('\n\n\n')   if you want to introduce
            # some blank lines between the contents of the copied files

ファイルが大きすぎて完全に読み取ってRAMに保持できない場合、read(10000)たとえばを使用して、固定長のチャンクによってループでコピーされる各ファイルを読み取るには、アルゴリズムが少し異なる必要があります。

score 0 · Accepted Answer

def concatFiles():
    path = 'input/'
    files = os.listdir(path)
    for idx, infile in enumerate(files):
        print ("File #" + str(idx) + "  " + infile)
    concat = ''.join([open(path + f).read() for f in files])
    with open("output_concatFile.txt", "w") as fo:
        fo.write(path + concat)

if __name__ == "__main__":
    concatFiles()

score -2 · Accepted Answer

  import os
  files=os.listdir()
  print(files)
  print('#',tuple(files))
  name=input('Enter the inclusive file name: ')
  exten=input('Enter the type(extension): ')
  filename=name+'.'+exten
  output_file=open(filename,'w+')
  for i in files:
    print(i)
    j=files.index(i)
    f_j=open(i,'r')
    print(f_j.read())
    for x in f_j:
      outfile.write(x)

python - Pythonでテキストファイルを連結するにはどうすればよいですか？

12 に答える 12

Related

Reference