python - インデックスが重複しているデータフレームから選択する

Question

私はPythonとパンダに不慣れです。日時インデックス付きデータフレームがあります。時刻が08:00:00を超える行を選択したいpd.DataFrame.select関数を使用してみました。インデックスに重複するエントリがあるため、失敗しています。

私はそれを正しく試していますか？

それを回避する方法はありますか？

重複するエントリでデータのインデックスを作成するのは悪い習慣ですか？

>>> df.head(10)
                            A
time                         
1900-01-01 00:01:01.456170  0
1900-01-01 00:01:01.969600  0
1900-01-01 00:01:04.305494  0
1900-01-01 00:01:13.860365  0
1900-01-01 00:01:19.666371  0
1900-01-01 00:01:24.920744  0
1900-01-01 00:01:24.931466  0
1900-01-01 00:02:07.522741  0
1900-01-01 00:02:13.857793  0
1900-01-01 00:02:34.817765 -7
>>> timeindexvalid = lambda x : x.to_datetime() > datetime(1900, 1, 1, 8)
>>> df.select(timeindexvalid)
Traceback (most recent call last):

    raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

score 2 · Accepted Answer

式を使用して、を使用せずに必要なインデックスを選択できますselect()。

In [1]: df
Out[1]:
            A
time
2012-05-01  0
2012-05-02  1
2012-05-02  2

In [2]: df.index
Out[2]:
<class 'pandas.tseries.index.DatetimeIndex'>

In [3]: df.index.is_unique
Out[3]: False

In [4]: df[df.index > datetime(2012,5,1)]
Out[4]:
            A
time
2012-05-02  1
2012-05-02  2

select を使用してエラーを再現する:

In [5]: sel = lambda x: x > datetime(2012,5,1)

In [6]: df.select(sel)
Exception: Reindexing only valid with uniquely valued Index objects

score 1 · Accepted Answer

between_timeメソッドを使用してこれをより簡単にサポートするために、GitHub にメモを作成しました。

https://github.com/pydata/pandas/issues/2826

score 0 · Accepted Answer

使用できますindexer_between_time(ここでは、真夜中の 1 分前から 2 分前までの間):

In [11]: df1.iloc[df1.index.indexer_between_time('00:01:00', '00:02:00')]
Out[11]:
                            A
time
1900-01-01 00:01:01.456170  0
1900-01-01 00:01:01.969600  0
1900-01-01 00:01:04.305494  0
1900-01-01 00:01:13.860365  0
1900-01-01 00:01:19.666371  0
1900-01-01 00:01:24.920744  0
1900-01-01 00:01:24.931466  0

python - インデックスが重複しているデータフレームから選択する

3 に答える 3

Related

Reference