Suppose I have the following DataFrame (timeseries, first column is a DateTimeIndex)
atn file
datetime
2012-10-08 14:00:00 23.007462 1
2012-10-08 14:30:00 27.045666 1
2012-10-08 15:00:00 31.483825 1
2012-10-08 15:30:00 37.540651 2
2012-10-08 16:00:00 43.564573 2
2012-10-08 16:00:00 48.589852 2
2012-10-08 16:00:00 55.289452 2
My goal is to to extract the rows with the first appearance of a certain number in the last column 'file', so to obtain a table similar to this:
datetime atn
file
1 2012-10-08 14:00:00 23.007462
2 2012-10-08 15:30:00 37.540651
My approach was to groupby 'file' and then aggregate on 'first':
dt.groupby(by="file").aggregate("first")
But the problem with this is that then the index is not used as a column which is grouped. I solved this by first adding the index as a column by:
dt2 = dt.reset_index()
dt2.groupby(by="file").aggregate("first")
But now the problem is that the datetime column aren't dates anymore but floats:
datetime atn
file
1 1.349705e+18 23.007462
2 1.349710e+18 37.540651
Is there
- a way to convert the floats back to a datetime?
- OR a way to preserve the datetimes in the groupby/aggregate-operation?
- OR a better way to achieve this the final tabel?
The example dataframe can be used as follows:
Copy this (to clipboard):
2012-10-08 14:00:00, 23.007462, 1
2012-10-08 14:30:00, 27.045666, 1
2012-10-08 15:00:00, 31.483825, 1
2012-10-08 15:30:00, 37.540651, 2
2012-10-08 16:00:00, 43.564573, 2
2012-10-08 16:00:00, 48.589852, 2
2012-10-08 16:00:00, 55.289452, 2
And then:
dt = pandas.read_clipboard(sep=",", parse_dates=True, index_col=0,
names=["datetime", "atn", "file"])