python - Python を使用して、複数の .txt ファイルからデータを抽出し、それらを Excel ファイルに移動する必要があります。

Question

.txt ファイルには 68 行が含まれています。行 68 には、抽出する必要がある 5 つのデータがありますが、方法がわかりません。約 20 個の .txt ファイルがあり、そのすべてで 68 行目を読み取る必要があります。ただし、抽出したすべてのデータを 1 つの Excel ファイルにドロップする必要があります。

68 行目は次のようになります。

Final graph has 1496 nodes and n50 of 53706, max 306216, total 5252643, using 384548/389191 reads

私は基本的にこれらすべての数字が必要です。

score 0 · Accepted Answer

次の例は、正規表現に依存する David のものよりもややエレガントではありませんが、透過的です。あなたが説明した特定のフォーマットに強く依存しています。また、実際には、関心のある変数が 6 つ (5 つではなく) あるように思われます。読み取りの比率を小数に変換できない場合を除きます。

nameList にファイル名の正しいリストを指定する必要があります (便利な方法で名前が付けられていない場合は、手動で)。

また、エクセルファイルではなくcsvに出力しています。もちろん、Excel で csv ファイルを開くのは非常に簡単で、そこから xls として保存できます。

コメントに応じて編集 (05/19/13):フルパスを含めるのは簡単です。

import csv
import string

# Make list of all 20 files like so:
nameList = ['/full/path/to/Log.txt', '/different/path/to/Log.txt', '/yet/another/path/to/Log.txt']

lineNum = 68

myCols = ['nodes','n50','max','total','reads1','reads2']
myData = []

for name in nameList:
    fi = open(name,"r")

    table = string.maketrans("","")

    # split line lineNum into list of strings
    strings = fi.readlines()[lineNum-1].split()

    # remove punctuation appropriately
    nodes = int(strings[3])
    n50 = int(strings[8].translate(table,string.punctuation))
    myMax = int(strings[10].translate(table,string.punctuation))
    total = int(strings[12].translate(table,string.punctuation))
    reads1 = int(strings[14].split('/')[0])
    reads2 = int(strings[14].split('/')[1])

    myData.append([nodes, n50, myMax, total, reads1, reads2])

# Write the data out to a new csv file
fileOut = "out.csv"
csvFileOut = open(fileOut,"w")
myWriter = csv.writer(csvFileOut)
myWriter.writerow(myCols)
for line in myData:
    myWriter.writerow(line)
csvFileOut.close()

score 0 · Accepted Answer

このようなタスクにはopenpyxlを使用するのが好きです。以下は、1 つのファイルの例です。これを複数のファイルに拡張できるはずです。スプレッドシートのデータをどのようにフォーマットするかを正確に言わなかったので、1 行のヘッダーを作成し、その後に 1 行のデータ (5 フィールド) をファイルに追加しました。あなたのプロジェクトについてもっと情報があれば、これは洗練される可能性があります。

from openpyxl import Workbook
import re

wb = Workbook()
ws = wb.get_active_sheet()

# write column headers
ws.cell(row=0, column=0).value = 'nodes'
ws.cell(row=0, column=1).value = 'n50'
ws.cell(row=0, column=2).value = 'max'
ws.cell(row=0, column=3).value = 'total'
ws.cell(row=0, column=4).value = 'reads'

# open file and extract lines into list            
f = open("somedata.txt", "r")
lines = f.readlines()

# compile regex using named groups and apply regex to line 68
p = re.compile("^Final\sgraph\shas\s(?P<nodes>\d+)\snodes\sand\sn50\sof\s(?P<n50>\d+),\smax\s(?P<max>\d+),\stotal\s(?P<total>\d+),\susing\s(?P<reads>\d+\/\d+)\sreads$")
m = p.match(lines[67])

# if we have a match, then write the data to the spreadsheet
if (m):
    ws.cell(row=1, column=0).value = m.group('nodes')
    ws.cell(row=1, column=1).value = m.group('n50')
    ws.cell(row=1, column=2).value = m.group('max')
    ws.cell(row=1, column=3).value = m.group('total')
    ws.cell(row=1, column=4).value = m.group('reads')

wb.save('mydata.xlsx')

score 0 · Accepted Answer

次を使用してテキストファイルを開きます。

f = open('filepath.txt', 'r')
for line in f:
    #do operations for each line in the textfile

読みたいテキストファイルごとに繰り返します

これは、Excel との間で読み取り/書き込みを行うための Python ライブラリへのリンクです。あなたはxlwtを使いたいのですが、それは次のように聞こえます

python - Python を使用して、複数の .txt ファイルからデータを抽出し、それらを Excel ファイルに移動する必要があります。

3 に答える 3

Related

Reference