python - Pythonを使用して共通フィールドに基づいて2つのcsvファイルをマージします

Question

2つのmysqlテーブルから2つのcsvファイルを生成しました。次に、2つのファイルをマージします。

最初のcsvにこのヘッダーを手動で追加しました。

ID,name,sector,sub_sector

これは2番目のcsvヘッダーです。

ID,url

私の目標は1つのファイルを持つことです：

ID,name,sector,sub_sector,url

注：最初のファイルのレコード全体が2番目のファイルと一致するわけではありません。

これは私が使用していたスニペットです：

#!/usr/bin/env python
import glob, csv
if __name__ == '__main__':

    infiles = glob.glob('./*.csv')
    out = 'temp.csv'
    data = {}
    fields = []

    for fname in infiles:
        df = open(fname, 'rb')
        reader = csv.DictReader(df)
        for line in reader:
            # assuming the field is called ID
            if line['ID'] not in data:
                data[line['ID']] = line
            else:
                for k,v in line.iteritems():
                    if k not in data[line['ID']]:
                        data[line['ID']][k] = v
            for k in line.iterkeys():
                if k not in fields:
                    fields.append(k)
        del reader
        df.close()

    writer = csv.DictWriter(open(out, "wb"), fields, extrasaction='ignore', dialect='excel')
    # write the header at the top of the file
    writer.writeheader()
    writer.writerows(data)
    del writer

別のsofスレッドから取得。これは私が得ているエラーです：

  File "db_work.py", line 30, in <module>
    writer.writerows(data)
  File "/usr/lib/python2.7/csv.py", line 153, in writerows
    rows.append(self._dict_to_list(rowdict))
  File "/usr/lib/python2.7/csv.py", line 144, in _dict_to_list
    ", ".join(wrong_fields))
ValueError: dict contains fields not in fieldnames: 4, 4, 4, 6
~/Development/python/DB$ python db_work.py
Traceback (most recent call last):
  File "db_work.py", line 30, in <module>
    writer.writerows(data)
  File "/usr/lib/python2.7/csv.py", line 153, in writerows
    rows.append(self._dict_to_list(rowdict))
  File "/usr/lib/python2.7/csv.py", line 145, in _dict_to_list
    return [rowdict.get(key, self.restval) for key in self.fieldnames]
AttributeError: 'str' object has no attribute 'get'

これを修正する方法はありますか？

score 3 · Accepted Answer

.writerows()リストを期待しますが、代わりに渡しますdict。私はあなたが次の値だけを書きたかったと思いますdata：

writer = csv.DictWriter(open(out, "wb"), fields, dialect='excel')
# write the header at the top of the file
writer.writeheader()
writer.writerows(data.values())

個人的には、行だけでファイルを読み取り、それらをdictに追加してから、他のファイルを読み取り、対応するエントリid, urlを追加して各行を一度に1つずつ書き込みます。url

import csv

with open('urls.csv', 'rb') as urls:
    reader = csv.reader(urls)
    reader.next()  # skip the header, won't need that here
    urls = {id: url for id, url in reader}

with open('other.csv', 'rb') as other:
    with open(out, 'wb') as output:
        reader = csv.reader(other)
        writer = csv.writer(output)
        writer.writerow(reader.next() + ['url'])  # read old header, add urls and write out
        for row in reader:
            # write out original row plus url if we can find one
            writer.writerow(row + [urls.get(row[0], '')])

python - Pythonを使用して共通フィールドに基づいて2つのcsvファイルをマージします

1 に答える 1

Related

Reference