python - 複数の csv ファイルの行数を数え、空白行をスキップする

Question

空の行を除く ('/dir'/) の csv ファイルの長さを取得する必要があります。私はこれを試しました：

import os, csv, itertools, glob

#To filer the empty lines
def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

#To read each file in '/dir/', compute the length and write the output 'count.csv'
with open('count.csv', 'w') as out:
     file_list = glob.glob('/dir/*')
     for file_name in file_list:
         with open(file_name, 'r') as f:
              filt_f1 = filterfalse(lambda line: line.startswith('\n'), f)
              count = sum(1 for line in f if (filt_f1))
              out.write('{c} {f}\n'.format(c = count, f = file_name))

希望どおりの出力が得られますが、残念ながら、各ファイルの長さ (「/dir/」内) には空の行が含まれています。

空の行がどこから来ているかを確認するには、次のようfile.csvに読みます。file.txt

*text,favorited,favoriteCount,...
"Retweeted user (@user):...
'empty row'
Do Operators...*

score 1 · Accepted Answer

パンダの使用をお勧めします。

import pandas

# Reads csv file and converts it to pandas dataframe.
df = pandas.read_csv('myfile.csv')

# Removes rows where data is missing.
df.dropna(inplace=True)

# Gets length of dataframe and displays it.
df_length = df.count + 1
print('The length of the CSV file is', df_length)

ドキュメント: http://pandas.pydata.org/pandas-docs/version/0.18.0/

score 1 · Accepted Answer

関数filterfalse()は正しく実行されます。これは、標準ライブラリモジュールで名前が付けられているものとまったく同じであるため、独自のものを作成するのではなく、単にそれを使用しない理由が不明です — 主な利点は、既にテストおよびデバッグされていることです。(多くは C で書かれているため、ビルトインもしばしば高速です。)ifilterfalseitertools

問題は、ジェネレーター関数を適切に使用していないことです。

これはジェネレーターオブジェクトを返すため、のyieldようなコードを使用して複数の値を反復処理する必要がありますfor line in filt_f1。
指定した述語関数の引数は、スペースやタブなど、他の先頭の空白文字を含む行を適切に処理しません。— そのため、lambdaこれらのケースも処理できるようにパスを変更する必要があります。

以下のコードには、これらの両方の変更が加えられています。

import os, csv, itertools, glob

#To filter the empty lines
def filterfalse(predicate, iterable):
    # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8
    if predicate is None:
        predicate = bool
    for x in iterable:
        if not predicate(x):
            yield x

#To read each file in '/dir/', compute the length and write the output 'count.csv'
with open('count.csv', 'w') as out:
    file_list = glob.glob('/dir/*')
    for file_name in file_list:
        with open(file_name, 'r') as f:
            filt_f1 = filterfalse(lambda line: not line.strip(), f)  # CHANGED
            count = sum(1 for line in filt_f1)  # CHANGED
            out.write('{c} {f}\n'.format(c=count, f=file_name))

python - 複数の csv ファイルの行数を数え、空白行をスキップする

2 に答える 2

Related

Reference