python - Pythonでcsvファイルのi番目の列を読み取る最良の方法は何ですか?

Question

私は、列ごとに CSV ファイルを読み取るための迅速な機能を提供する R に慣れていますが、Python で大きなデータ (CSV など) ファイルを読み取るための迅速かつ効率的な方法を提案できますか? たとえば、CSV ファイルの i^{番目の列。}

私は次のものを持っていますが、時間がかかります：

    import os,csv, numpy, scipy
    from numpy import *
    f= open('some.csv', 'rb') 
    reader = csv.reader(f, delimiter=',')
    header = reader.next()
    zipped = zip(*reader)
    print( zipped[0] ) # is the first column

Pythonで（大きなファイルから）データを読み取るより良い方法はありますか（少なくともメモリに関してRと同じくらい速い）？

score 5 · Accepted Answer

pandas.read_csvとそのuse_cols引数を使用することもできます。こちらをご覧ください

import pandas as pd

data = pd.read_csv('some.csv', use_cols = ['col_1', 'col_2', 'col_4'])
...

score 2 · Accepted Answer

import csv

with open('some.csv') as fin:
    reader = csv.reader(fin)
    first_col = [row[0] for row in reader]

あなたが使用しているzipのは、ファイル全体をメモリにロードし、それを転置して列を取得することです。列の値のみが必要な場合は、それをリストに含めて開始します。

複数の列が必要な場合は、次のことができます。

from operator import itemgetter
get_cols = itemgetter(1, 3, 5)
cols = map(get_cols, reader)

python - Pythonでcsvファイルのi番目の列を読み取る最良の方法は何ですか?

2 に答える 2

Related

Reference