python - 新しい基準を満たす新しいファイルに行を書き込んで、Python で csv を循環するにはどうすればよいですか?

Question

私はこれにしばらく取り組んできましたが、専門家のアドバイスを求めることが私の最善の利益だと思います。私はこれを可能な限り最善の方法で書いているわけではないことを知っており、うさぎの穴に落ちて混乱しました。

私はcsvを持っています。実際、たくさん。その部分は問題ではありません。

CSV の上部にある行は実際には CSV データではありませんが、データが有効なデータである重要な情報が含まれています。特定の種類のレポートでは、1 つの行に表示され、別の行に表示されます。

私のデータは、通常は 10 か 11 行目から始まりますが、常に確実であるとは限りません。最初の列には常に同じ情報 (データテーブルのヘッダー) があることは知っています。

前の行からレポートの日付を取得し、ファイルタイプ A の場合は stuffA を実行し、ファイルタイプ B の場合は stuffB を実行してから、その行を新しいファイルに書き出します。行のインクリメントに問題があり、何が間違っているのかわかりません。

サンプルデータ：

"Attribute ""OPSURVEYLEVEL2_O"" [Category = ""Retail v1""]"
Date exported: 2/16/13
Exported by user: William
Project: 
Classification: Online Retail v1
Report type: Attributes
Date range: from 12/14/12 to 12/14/12
"Filter OpSurvey Level 2(mine):  [ LEVEL:SENTENCE TYPE:KEYWORD {OPSURVEYLEVEL2_O:""gift certificate redemption"", OPSURVEYLEVEL2_O:""combine accounts"", OPSURVEYLEVEL2_O:""cancel account"", OPSURVEYLEVEL2_O:""saved project moved to purchased project"", OPSURVEYLEVEL2_O:""unlock account"", OPSURVEYLEVEL2_O:""affiliate promotions"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""disclaimer not clear"", OPSURVEYLEVEL2_O:""prepaid issue"", OPSURVEYLEVEL2_O:""customer wants to use coupons for print to store"", OPSURVEYLEVEL2_O:""customer received someone else's order"", OPSURVEYLEVEL2_O:""hi-res images unavailable"", OPSURVEYLEVEL2_O:""how to re-order"", OPSURVEYLEVEL2_O:""missing items"", OPSURVEYLEVEL2_O:""missing envelopes: print to store"", OPSURVEYLEVEL2_O:""missing envelopes: mail order"", OPSURVEYLEVEL2_O:""group rooms"", OPSURVEYLEVEL2_O:""print to store"", OPSURVEYLEVEL2_O:""print to store coupons"", OPSURVEYLEVEL2_O:""publisher: card not available for print to store"", OPSURVEYLEVEL2_O:publisher}]"
Total: 905
OPSURVEYLEVEL2_O,Distinct Document,% of Document,Sentiment Score
PRINT TO STORE,297,32.82,-0.1
...

サンプルコード

#!/usr/bin/python

import csv, os, glob, sys, errno

path = '/path/to/Downloads'
for infile in glob.glob(os.path.join(path,'report_ATTRIBUTE_OP*.csv')):
    if 'OPSURVEYLEVEL2' in infile:
        prime_column = 'ops2'
    elif 'OPSURVEYLEVEL3' in infile:
        prime_column = 'ops3'
    else:
        sys.exit(errno.ENOENT)
    with open(infile, "r") as csvfile:
        reader = csv.reader(csvfile)
        report_date = 'DATE NOT FOUND'
        # import pdb; pdb.set_trace()
        for row in reader:
            foo = 0
            while foo < 1: 
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    foo = 1
                if "Date range" in row:
                    report_date = row[0][-8:]
                break
            if foo >= 1:
                if row[0][0:].find('OPSURVEYLEVEL') == 0:
                    break
                if 'ops2' in prime_column:
                    dup_col = row[0]
                    row.insert(0,dup_col)
                    row.append(report_date)
                elif 'ops3' in prime_column:
                    row.append(report_date)
                with open('report_merge.csv', 'a') as outfile:
                    outfile.write(row)
            reader.next()

score 0 · Accepted Answer

このコードには 2 つの問題があります。

1 つ目は、コードがの日付範囲を見つけられないことですrow。この線：

if "Date range" in row:

...次のようにする必要があります。

if "Date range" in row[0]:

2 つ目は、次のコードです。

if row[0][0:].find('OPSURVEYLEVEL') == 0:
    break

... は、データテーブルのヘッダー行の後にループから抜け出していforます。これは、それが最も近い囲みループであるためです。whileこのコードの以前のバージョンのどこかに別のコードがあったのではないかと思います。

次のように、 andifの代わりにステートメントを使用すると、コードはより単純になります (そしてバグがなくなります) 。whileif

    for row in reader:
        if foo < 1: 
            if row[0][0:].find('OPSURVEYLEVEL') == 0:
                foo = 1
            if "Date range" in row[0]:  # Changed this line
                print("found report date")
                report_date = row[0][-8:]
        else:
            print(row)
            if row[0][0:].find('OPSURVEYLEVEL') == 0:
                break
            if 'ops2' in prime_column:
                dup_col = row[0]
                row.insert(0,dup_col)
                row.append(report_date)
            elif 'ops3' in prime_column:
                row.append(report_date)
            with open('report_merge.csv', 'a') as outfile:
                outfile.write(','.join(row)+'\n')

python - 新しい基準を満たす新しいファイルに行を書き込んで、Python で csv を循環するにはどうすればよいですか?

1 に答える 1

Related

Reference