python - 数値の行をオフセットするためのこの非常に基本的な python スクリプトを高速化するにはどうすればよいですか

Question

この例のように、スペースで区切られた ASCII テキストの数字を含む単純なテキストファイルがあります。

150604849   
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*

基本的に、最初の行をそのままコピーし、その後のすべての行で、最初の値 (x) をオフセットし、2 番目の値 (y) をオフセットし、3 番目の値を変更せずにオフセットし、最後の数値の半分にします。

Python の学習経験として、次のコードをまとめました (下品で攻撃的である場合は申し訳ありませんが、本当に攻撃的ではないことを意味します)。問題なく動作します。ただし、私が使用している入力ファイルのサイズは数 GB であり、実行を高速化する方法があるかどうか疑問に思っています。現在、740 MB のファイルの場合、2 分 21 秒かかります

import glob

#offset values
offsetx = -306000
offsety = -5806000

files = glob.glob('*.pts')
for file in files:
    currentFile = open(file, "r")
    out = open(file[:-4]+"_RGB_moved.pts", "w")
    firstline = str(currentFile.readline())
    out.write(str(firstline.split()[0]))

    while 1:
        lines = currentFile.readlines(100000)
        if not lines:
            break
        for line in lines:
            out.write('\n')
            words = line.split()
            newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]              
            out.write(" ".join(newwords))

どうもありがとう

score 3 · Accepted Answer

使用しないでください.readlines()。ファイルをイテレータとして直接使用します。

for file in files:
    with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
        firstline = next(currentFile)
        out.write(firstline.split(None, 1)[0])

        for line in currentfile:
            out.write('\n')
            words = line.split()
            newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]              
            out.write(" ".join(newwords))

words[2]また、いくつかの Python のベストプラクティスを追加しました。フロートに変換してから、再び文字列に戻す必要はありません。

モジュールを使用して調べることもできますcsv。C コードの行の分割と再結合を処理できます。

import csv

for file in files:
    with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
        reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
        writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)

        out.writerow(next(reader)[0])

        for row in reader:
            newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]              
            out.writerow(newrow)

score 0 · Accepted Answer

CSV パッケージを使用します。スクリプトよりも最適化されている可能性があり、コードが簡素化されます。

python - 数値の行をオフセットするためのこの非常に基本的な python スクリプトを高速化するにはどうすればよいですか

2 に答える 2

Related

Reference