python - Preserving datetime index in groupby operation

Question

Suppose I have the following DataFrame (timeseries, first column is a DateTimeIndex)

                           atn   file
datetime                             
2012-10-08 14:00:00  23.007462      1
2012-10-08 14:30:00  27.045666      1
2012-10-08 15:00:00  31.483825      1
2012-10-08 15:30:00  37.540651      2
2012-10-08 16:00:00  43.564573      2
2012-10-08 16:00:00  48.589852      2
2012-10-08 16:00:00  55.289452      2

My goal is to to extract the rows with the first appearance of a certain number in the last column 'file', so to obtain a table similar to this:

       datetime             atn
file                             
1      2012-10-08 14:00:00  23.007462
2      2012-10-08 15:30:00  37.540651

My approach was to groupby 'file' and then aggregate on 'first':

dt.groupby(by="file").aggregate("first")

But the problem with this is that then the index is not used as a column which is grouped. I solved this by first adding the index as a column by:

dt2 = dt.reset_index()
dt2.groupby(by="file").aggregate("first")

But now the problem is that the datetime column aren't dates anymore but floats:

          datetime        atn
file                         
1     1.349705e+18  23.007462
2     1.349710e+18  37.540651

Is there

a way to convert the floats back to a datetime?
OR a way to preserve the datetimes in the groupby/aggregate-operation?
OR a better way to achieve this the final tabel?

The example dataframe can be used as follows:

Copy this (to clipboard):

2012-10-08 14:00:00,  23.007462,     1
2012-10-08 14:30:00,  27.045666,     1
2012-10-08 15:00:00,  31.483825,     1
2012-10-08 15:30:00,  37.540651,     2
2012-10-08 16:00:00,  43.564573,     2
2012-10-08 16:00:00,  48.589852,     2
2012-10-08 16:00:00,  55.289452,     2

And then:

dt = pandas.read_clipboard(sep=",", parse_dates=True, index_col=0, 
                           names=["datetime", "atn", "file"])

score 1 · Accepted Answer

これはパンダのバグだと思います-グループバイの後にdtypeがフロートに変更されます

dt3 = dt2.groupby(by="file").aggregate("first")
dt3.dtypes

私に与えます：

datetime    float64
atn         float64

dtype を datetime64 に戻すには、次のようにします。

dt3['datetime'] = pd.Series(dt3['datetime'], dtype='datetime64[ns]')

GitHubに新しいイシューを作成しました

score 0 · Accepted Answer

0

これは修正され、0.9.1 リリースになると思います

于 2012-11-14T00:11:06.590 に答える

score 0 · Accepted Answer

バグのように見えますが、現時点では、指定しない parse_dates=Trueと期待どおりの結果が得られます。

私のipythonの結果 - いいえparse_dates=True:-

In [29]: dt2 = pd.read_clipboard(sep=",", index_col=0, 
                           names=["datetime", "atn", "file"])

In [30]: dt2
Out[30]: 
                           atn  file
datetime                            
2012-10-08 14:00:00  23.007462     1
2012-10-08 14:30:00  27.045666     1
2012-10-08 15:00:00  31.483825     1
2012-10-08 15:30:00  37.540651     2
2012-10-08 16:00:00  43.564573     2
2012-10-08 16:00:00  48.589852     2
2012-10-08 16:00:00  55.289452     2

In [31]: dt2.reset_index().groupby(by="file").aggregate("first")
Out[31]: 
                 datetime        atn
file                                
1     2012-10-08 14:00:00  23.007462
2     2012-10-08 15:30:00  37.540651

In [32]:

私のipythonの結果、parse_dates=True:-

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
                           names=["datetime", "atn", "file"])
KeyboardInterrupt

In [33]: dt = pd.read_clipboard(sep=",", parse_dates=True, index_col=0, 
                           names=["datetime", "atn", "file"])

In [34]: dt.reset_index().groupby(by="file").aggregate("first")
Out[34]: 
          datetime        atn
file                         
1     1.349705e+18  23.007462
2     1.349710e+18  37.540651

明示的にチェックdtypes:-

In [40]: new_dt = dt.reset_index().groupby(by="file").aggregate("first")

In [41]: new_dt
Out[41]: 
          datetime        atn
file                         
1     1.349705e+18  23.007462
2     1.349710e+18  37.540651

In [42]: new_dt.dtypes
Out[42]: 
datetime    float64
atn         float64

In [43]: new_dt2 = dt2.reset_index().groupby(by="file").aggregate("first")

In [44]: new_dt2.dtypes
Out[44]: 
datetime     object
atn         float64

python - Preserving datetime index in groupby operation

3 に答える 3

Related

Reference