python - Python daskでセパレーター付きのcsvを読み取る

Question

DataFrame'#####' 5 ハッシュで区切られた csv ファイルを読み取って作成しようとしています

コードは次のとおりです。

import dask.dataframe as dd
df = dd.read_csv('D:\temp.csv',sep='#####',engine='python')
res = df.compute()

エラーは次のとおりです。

dask.async.ValueError:
Dask dataframe inspected the first 1,000 rows of your csv file to guess the
data types of your columns.  These first 1,000 rows led us to an incorrect
guess.

For example a column may have had integers in the first 1000
rows followed by a float or missing value in the 1,001-st row.

You will need to specify some dtype information explicitly using the
``dtype=`` keyword argument for the right column names and dtypes.

    df = dd.read_csv(..., dtype={'my-column': float})

Pandas has given us the following error when trying to parse the file:

  "The 'dtype' option is not supported with the 'python' engine"

Traceback
 ---------
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/home/ec2-user/anaconda3/lib/python3.4/site-packages/dask/dataframe/io.py", line 69, in _read_csv
raise ValueError(msg)

それで、それを取り除く方法。

エラーに従うと、すべての列にdtypeを指定する必要がありますが、100列以上ある場合は役に立ちません。

そして、セパレーターなしで読んでいる場合、すべてがうまくいきますが、どこにでも ##### があります。それで、それをパンダに計算した後、それをDataFrame取り除く方法はありますか?

だからこれで私を助けてください。

python - Python daskでセパレーター付きのcsvを読み取る

2 に答える 2

Related

Reference