python - Correct way to deal with a list of associated data items associated with several index values with pandas/pytables

Question

I was wondering what the correct way to deal with storing/reading through a list of items such as the following example dealing with a rockstar, where the list is known to hold a maximum number of values to hdf5:

Date_of_Birth
Bands[] - where the maximum number of bands is 10
Siblings[] - where the maximum number of siblings is 6
Date_of_Death

All of these would be column names.

One way I had considered, but turned out to give an error (ValueError: cannot reindex from a duplicate axis) was to have duplicate column names. Otherwise, what I could do is have Bands 1, Bands 2 etc... but that would make retrieval and querying bothersome. Is there a better way? Any help would be very much appreciated!

score 0 · Accepted Answer

バンドと兄弟の各列を実際にリストしたいこのようなものについては、マルチインデックスを使用しようとします

これらの列を使用して df と呼ぶデータフレームがあるとします。この呼び出しdf.columnsはInt64Index([dob, band_1, band_2], dtype='int64'). これを行うことで、すべてのバンドを一度に取得するものにインデックスを再構築できます...

編集により、「部分的な」MultiIndex を実行する方法が見つかりました

df.columns = pd.MultiIndex.from_tuples([('dob',''),('bands','band_1'),('bands','band_2')])

また、タプルのリストを構築するためのヒント - 既存の列に一連のリスト内包表記を追加できます....

 [('band',each) for each in df.columns[df.columns>1].apply(lambda x: re.search("band",x)]
 #etc

python - Correct way to deal with a list of associated data items associated with several index values with pandas/pytables

1 に答える 1

Related

Reference