I have two pandas data frames. One contains my usual measurements (time-indexed). A second frame from a different source contains system states. It is also time-indexed, but the times in the state data frame do not match the times of my data frame with the measurements. What I would like to achieve is that now each row in the measurements data frame also contains the last state that appeared in the state data frame before the time of the measurement.
As an example, I have a state frame like this:
state
time
2013-02-14 12:29:37.101000 SystemReset
2013-02-14 12:29:39.103000 WaitFace
2013-02-14 12:29:39.103000 NormalExecution
2013-02-14 12:29:39.166000 GreetVisitors
2013-02-14 12:29:46.879000 AskForParticipation
2013-02-14 12:29:56.807000 IntroduceVernissage
2013-02-14 12:30:07.275000 PictureQuestion
And my measurements are like this:
utime
time
2013-02-14 12:29:38.697038 0
2013-02-14 12:29:38.710432 1
2013-02-14 12:29:39.106475 2
2013-02-14 12:29:39.200701 3
2013-02-14 12:29:40.197014 0
2013-02-14 12:29:42.217976 5
2013-02-14 12:29:57.460601 7
I would like to end up with a data frame like this:
utime state
time
2013-02-14 12:29:38.697038 0 SystemReset
2013-02-14 12:29:38.710432 1 SystemReset
2013-02-14 12:29:39.106475 2 NormalExecution
2013-02-14 12:29:39.200701 3 GreetVisitors
2013-02-14 12:29:40.197014 0 GreetVisitors
2013-02-14 12:29:42.217976 5 GreetVisitors
2013-02-14 12:29:57.460601 7 Introducevernissage
I found a quite inefficient solution like this:
result = measurements.copy()
stateList = []
for timestamp, _ in measurements.iterrows():
candidateStates = states.truncate(after=timestamp).tail(1)
if len(candidateStates) > 0:
stateList.append(candidateStates['state'].values[0])
else:
stateList.append("unknown")
result['state'] = stateList
Do you see any way to optimize this?