python - 可変列番号と一致するために必要なループ引数の numpy 配列の可変数

Question

csv ファイルの内容を numpy 配列に入力しています。CSV ファイルの列数は変更される場合があります。最初の 2 つの文字列列 (日付 + 時刻) を日付オブジェクトに連結しようとしていますが、stackoverflow でこの例を見つけました。ただし、この例では、列の数が変わるたびにスクリプトを変更する必要があります。

次に例を示します。

#! /usr/bin/python
# variable number of numpy array for loop arguments, but only care about the first two 

import numpy as np
import csv
import os
import datetime as datetime

# simulate a csv file
from StringIO import StringIO
data = StringIO("""
Title
Date,Time,Speed
,,(m/s)
2012-04-01,00:10, 85
2012-04-02,00:20, 86
2012-04-03,00:30, 87
""".strip())

next(data)  # eat away the first line, which is the title
header = [item.strip() for item in next(data).split(',')] # get the headers
#print header
arr = np.genfromtxt(data, delimiter=',', skiprows=1, dtype=None) #skip the unit rows
arr.dtype.names = header # assign the header to names. so we can use it to do indexing

y1 = arr['Speed']   # column headings were assigned previously by arr.dtype.names = header

# Here is an example from:
# https://stackoverflow.com/questions/7500864/python-array-of-datetime-objects-from-numpy-ndarray

date_objects = np.array([datetime.datetime.strptime(a + b, "%Y-%m-%d%H:%M") 
                        for a,b,c in arr])
print date_objects

質問:上記の for ステートメントで、numpy 配列を受け取ります。現在、3 つの列があるため、a、b、c を指定していますが、4 番目の列を追加すると、このステートメントは ValueError: too many values to unpack で壊れます。この場合、最初の 2 つの列 a と b ですが、これをどのように書き直すことができますか? a,b,... in arr と言う方法はありますか?

私はすでにarrを最初の2つの列に接合しようとしました。

# Note1: Splice fails with index error too many indices
#arr_date_time = arr[:,:2]

スプライシングエラーの回避策は、dtype=object を設定し、dtype.names を設定しないことですが、列のインデックス作成が読みやすくなるため、dtype.names を設定したいと考えています。私の関連記事を参照してくださいNumpy set dtype=None, cannot splice columns and set dtype=object cannot set dtype.names

score 1 · Accepted Answer

これを試して：

date_objects = np.array([datetime.datetime.strptime(row[0] + row[1], "%Y-%m-%d%H:%M") 
                    for row in arr])

python - 可変列番号と一致するために必要なループ引数の numpy 配列の可変数

1 に答える 1

Related

Reference