python - .xy ファイルから Excel への読み込みの最適化または高速化

Question

いくつかの .xy ファイル (x と y の値を持つ 2 つの列) があります。それらすべてを読み取って、「y」値を単一のExcelファイルに貼り付けようとしました（「x」値はこれらすべてのファイルで同じです）。これまでのコードはファイルを 1 つずつ読み取りますが、非常に低速です (各ファイルで約 20 秒かかります)。かなりの数の .xy ファイルがあり、時間がかなりかかります。私が今まで持っているコードは次のとおりです。

import os,fnmatch,linecache,csv
from openpyxl import Workbook

wb = Workbook() 
ws = wb.worksheets[0]
ws.title = "Sheet1"


def batch_processing(file_name):
    row_count = sum(1 for row in csv.reader(open(file_name)))
    try:
        for row in xrange(1,row_count):

            data = linecache.getline(file_name, row)
            print data.strip().split()[1]   
            print data
            ws.cell("A"+str(row)).value = float(data.strip().split()[0])
            ws.cell("B"+str(row)).value = float(data.strip().split()[1])

        print file_name
        wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")
    except IndexError:
        pass


workingdir = "C:\Users\Mine\Desktop\P22_PC"
os.chdir(workingdir)
for root, dirnames, filenames in os.walk(workingdir):
    for file_name in fnmatch.filter(filenames, "*_Cs.xy"):
        batch_processing(file_name)

どんな助けでも大歓迎です。ありがとう。

score 2 · Accepted Answer

あなたの主な問題は、Excelに書き込んで、ディレクトリ内のすべてのファイルについて、ファイルのすべての行に保存していることだと思います。実際にExcelに値を書き込むのにかかる時間はわかりませんがsave、ループの外に移動して、すべてが追加された後でのみ保存すると、少し時間が短縮されます。また、これらのファイルのサイズはどれくらいですか？それらが巨大である場合、それlinecacheは良い考えかもしれませんが、それらが過度に大きくないと仮定すると、おそらくそれなしで行うことができます。

def batch_processing(file_name):

    # Using 'with' is a better way to open files - it ensures they are
    # properly closed, etc. when you leave the code block
    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        # row_count = sum(1 for row in csv.reader(open(file_name)))
        # ^^^You actually don't need to do this at all (though it is clever :)
        # You are using it now to govern the loop, but the more Pythonic way is
        # to do it as follows
        for line_no, line in enumerate(reader):
            # Split the line and create two variables that will hold val1 and val2
            val1, val2 = line
            print val1, val2 # You can also remove this - printing takes time too
            ws.cell("A"+str(line_no+1)).value = float(val1)
            ws.cell("B"+str(line_no+1)).value = float(val2)

    # Doing this here will save the file after you process an entire file.
    # You could save a bit more time and move this to after your walk statement - 
    # that way, you are only saving once after everything has completed
    wb.save(filename = os.path.splitext(file_name)[0]+".xlsx")

python - .xy ファイルから Excel への読み込みの最適化または高速化

1 に答える 1

Related

Reference