python - associating entries from one pandas data frame to a second one based on time

Question

I have two pandas data frames. One contains my usual measurements (time-indexed). A second frame from a different source contains system states. It is also time-indexed, but the times in the state data frame do not match the times of my data frame with the measurements. What I would like to achieve is that now each row in the measurements data frame also contains the last state that appeared in the state data frame before the time of the measurement.

As an example, I have a state frame like this:

                                          state
time                                           
2013-02-14 12:29:37.101000          SystemReset
2013-02-14 12:29:39.103000             WaitFace
2013-02-14 12:29:39.103000      NormalExecution
2013-02-14 12:29:39.166000        GreetVisitors
2013-02-14 12:29:46.879000  AskForParticipation
2013-02-14 12:29:56.807000  IntroduceVernissage
2013-02-14 12:30:07.275000      PictureQuestion

And my measurements are like this:

                            utime
time
2013-02-14 12:29:38.697038      0
2013-02-14 12:29:38.710432      1
2013-02-14 12:29:39.106475      2
2013-02-14 12:29:39.200701      3
2013-02-14 12:29:40.197014      0
2013-02-14 12:29:42.217976      5
2013-02-14 12:29:57.460601      7

I would like to end up with a data frame like this:

                            utime                 state
time
2013-02-14 12:29:38.697038      0           SystemReset
2013-02-14 12:29:38.710432      1           SystemReset
2013-02-14 12:29:39.106475      2       NormalExecution
2013-02-14 12:29:39.200701      3         GreetVisitors
2013-02-14 12:29:40.197014      0         GreetVisitors
2013-02-14 12:29:42.217976      5         GreetVisitors
2013-02-14 12:29:57.460601      7   Introducevernissage

I found a quite inefficient solution like this:

result = measurements.copy()
stateList = []
for timestamp, _ in measurements.iterrows():
    candidateStates = states.truncate(after=timestamp).tail(1)
    if len(candidateStates) > 0:
        stateList.append(candidateStates['state'].values[0])
    else:
        stateList.append("unknown")

result['state'] = stateList

Do you see any way to optimize this?

score 2 · Accepted Answer

多分何かのような

df = df1.join(df2, how='outer')
df['state'].fillna(method='ffill',inplace=True)
df.dropna()

動作しますか？は以下joinを生成します。

>>> df
                                          state  utime
time                                                  
2013-02-14 12:29:37.101000          SystemReset    NaN
2013-02-14 12:29:38.697038                  NaN      0
2013-02-14 12:29:38.710432                  NaN      1
2013-02-14 12:29:39.103000             WaitFace    NaN
2013-02-14 12:29:39.103000      NormalExecution    NaN
2013-02-14 12:29:39.106475                  NaN      2
2013-02-14 12:29:39.166000        GreetVisitors    NaN
2013-02-14 12:29:39.200701                  NaN      3
2013-02-14 12:29:40.197014                  NaN      0
2013-02-14 12:29:42.217976                  NaN      5
2013-02-14 12:29:46.879000  AskForParticipation    NaN
2013-02-14 12:29:56.807000  IntroduceVernissage    NaN
2013-02-14 12:29:57.460601                  NaN      7
2013-02-14 12:30:07.275000      PictureQuestion    NaN

次に、状態列を前方に入力できます。

>>> df['state'].fillna(method='ffill',inplace=True)
time
2013-02-14 12:29:37.101000            SystemReset
2013-02-14 12:29:38.697038            SystemReset
2013-02-14 12:29:38.710432            SystemReset
2013-02-14 12:29:39.103000               WaitFace
2013-02-14 12:29:39.103000        NormalExecution
2013-02-14 12:29:39.106475        NormalExecution
2013-02-14 12:29:39.166000          GreetVisitors
2013-02-14 12:29:39.200701          GreetVisitors
2013-02-14 12:29:40.197014          GreetVisitors
2013-02-14 12:29:42.217976          GreetVisitors
2013-02-14 12:29:46.879000    AskForParticipation
2013-02-14 12:29:56.807000    IntroduceVernissage
2013-02-14 12:29:57.460601    IntroduceVernissage
2013-02-14 12:30:07.275000        PictureQuestion
Name: state

次に、utime なしで行をドロップします。

>>> df.dropna()
                                          state  utime
time                                                  
2013-02-14 12:29:38.697038          SystemReset      0
2013-02-14 12:29:38.710432          SystemReset      1
2013-02-14 12:29:39.106475      NormalExecution      2
2013-02-14 12:29:39.200701        GreetVisitors      3
2013-02-14 12:29:40.197014        GreetVisitors      0
2013-02-14 12:29:42.217976        GreetVisitors      5
2013-02-14 12:29:57.460601  IntroduceVernissage      7

(可能な複数の) 状態と同時に utime がある場合を処理するために、それを微調整する必要がある場合があります。おそらくそれdrop_duplicatesをtake_last=True行うでしょう。<また、対の問題について朝のコーヒーを飲む前に、私よりも少し一生懸命考える必要があり<=ます。

python - associating entries from one pandas data frame to a second one based on time

1 に答える 1

Related

Reference