python - Python - テキストファイル内の特定のデータの書式設定

Question

データを抽出してフォーマットする必要があるログファイルが多数あります。これらのログファイルの中には、10,000 行を超える非常に大きなものもあります。

テキストファイルを読み、不要な行を削除し、残りの行を特定の形式に編集するのに役立つコードサンプルを誰かが提案できますか. 私が求めているものを持つ以前のスレッドを見つけることができませんでした。

編集する必要があるデータの例を以下に示します。

136: add student 50000000 35011 / Y01T :Unknown id in field 3 - ignoring line

137: add student 50000000 5031 / Y01S :Unknown id in field 3 - ignoring line

138: add student 50000000 881 / Y01S :Unknown course idnumber in field 4 - ignoring line

139: add student 50000000 5732 / Y01S :Unknown id in field 3 - ignoring line

134: add student 50000000 W250 / Y02S :OK

135: add student 50000000 35033 / Y01T :OK

ファイルを検索し、:OK で終わる行を削除する必要があります。次に、残りの行を次のような CSV 形式に編集する必要があります。

add,student,50000000,1234 / abcd

ヒント、コードスニペットなどは非常に役に立ち、非常に感謝しています。質問する前に最初に試してみますが、Pythonファイルアクセス/文字列フォーマットを独学する時間はほとんどありません。そのため、質問する前に試していないことを事前にお詫び申し上げます。

score 0 · Accepted Answer

これは解決策になる可能性があります：

import sys

if len(sys.argv) != 2:
    print 'Add an input file as parameter'
    sys.exit(1)

print 'opening file: %s' % sys.argv[1]

with open(sys.argv[1]) as input, open('output', 'w+') as output:
    for line in input:
        if line is not None:
            if line == '\n':
                pass
            elif 'OK' in line:
                pass
            else:
                new_line = line.split(' ', 7)
                output.write('%s,%s,%s,%s / %s\n' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6]))
                # just for checking purposes let's print the lines
                print '%s,%s,%s,%s / %s' % (new_line[1], new_line[2], new_line[3], new_line[4], new_line[6])

出力ファイル名に注意！

score 0 · Accepted Answer

正規表現が異なる場合は、ニーズに合わせて変更できます。また、他の区切り文字が必要な場合は csv.writer のパラメーターを変更することもできます。

import re, csv

regex = re.compile(r"(\d+)\s*:\s*(\w+)\s+(\w+)\s+(\w+)\s+([\w/ ]+?):\s*(.+)")
with open("out.csv", "w") as outfile:
    writer = csv.writer(outfile, delimiter=',', quotechar='"')
    with open("log.txt") as f:
        for line in f:
            m = regex.match(line)
            if m and m.group(6) != "OK":
                writer.writerow(m.groups()[1:-1])

score 0 · Accepted Answer

助けてくれてありがとう。初心者なので、最終的なコードはエレガントではありませんが、それでも機能します:)。

#open the file and create the CSV after filtering the input file.
def openFile(filename, keyword): #defines the function to open the file. User to pass two variables.

    list = []
    string = ''

    f = open(filename, 'r') #opens the file as a read and places it into the variable 'f'.
    for line in f: #for each line in 'f'.
        if keyword in line: #check to see if the keyword is in the line.
            list.append(line) #add the line to the list.

    print(list) #test.

    for each in list: #filter and clean the info, format the info into a CSV format.
        choppy = each.partition(': ') #split to remove the prefix.
        chunk = choppy[2] #take the good string.
        choppy = chunk.partition(' :') #split to remove the suffix.
        chunk = choppy[0] #take the good string.
        strsplit = chunk.split(' ') #split the string by spaces ' '.
        line = strsplit[0] + ',' + strsplit[1] + ',' + strsplit[2] + ',' + strsplit[3] + ' ' + strsplit[4] + ' ' + strsplit[5] + '\n' #concatenate the strings.

        string = string + line #concatenate each line to create a single string.

    print(string) #test.

    f = open(keyword + '.csv', 'w') #open a file to write.
    f.write(string) #write the string to the file.
    f.close() #close the file.



openFile('russtest.txt', 'cat')
openFile('CRON ENROL LOG 200913.txt', 'field 4')

ありがとう：）。

python - Python - テキスト ファイル内の特定のデータの書式設定

3 に答える 3

Related

Reference

python - Python - テキストファイル内の特定のデータの書式設定