2

「アクション」の列がいくつかあるデータフレームがあります。パターンに一致する最後のアクションを見つけて、その列のインデックスまたはラベルを返すにはどうすればよいですか?

私のデータ:

name    action_1    action_2    action_3
bill    referred    referred    
bob     introduced  referred    referred
mary    introduced      
june    introduced  referred    
dale    referred        
donna   introduced

私が欲しいもの:

name    action_1    action_2    action_3    last_referred
bill    referred    referred                action_2
bob     introduced  referred    referred    action_3
mary    introduced                          NA
june    introduced  referred                action_2
dale    referred                            action_1
donna   introduced                          NA
4

4 に答える 4

2

applyに沿って関数を使用し、パラメーターを追加の引数として関数にaxis=1渡します。pattern

In [3]: def func(row, pattern):
            referrer = np.nan
            for key in row.index:
                if row[key] == pattern:
                    referrer = key
            return referrer
        df['last_referred'] = df.apply(func, pattern='referred', axis=1)
        df
Out[3]:     name    action_1  action_2  action_3 last_referred
        0   bill    referred  referred      None      action_2
        1    bob  introduced  referred  referred      action_3
        2   mary  introduced                               NaN
        3   june  introduced  referred                action_2
        4   dale    referred                          action_1
        5  donna  introduced                               NaN
于 2013-08-30T20:08:32.557 に答える
1

pandas.meltと でこれを行うことができますgroupby

In [123]: molten = pd.melt(df, id_vars='name', var_name='last_referred')

In [124]: molten
Out[124]:
     name last_referred       value
0    bill      action_1    referred
1     bob      action_1  introduced
2    mary      action_1  introduced
3    june      action_1  introduced
4    dale      action_1    referred
5   donna      action_1  introduced
6    bill      action_2    referred
7     bob      action_2    referred
8    mary      action_2         NaN
9    june      action_2    referred
10   dale      action_2         NaN
11  donna      action_2         NaN
12   bill      action_3         NaN
13    bob      action_3    referred
14   mary      action_3         NaN
15   june      action_3         NaN
16   dale      action_3         NaN
17  donna      action_3         NaN

In [125]: gb = molten.groupby('name')

In [126]: col = gb.apply(lambda x: x[x.value == 'referred'].tail(1)).last_referred

In [127]: col.index = col.index.droplevel(1)

In [128]: col
Out[128]:
name
bill    action_2
bob     action_3
dale    action_1
june    action_2
Name: last_referred, dtype: object

In [129]: newdf = df.join(col, on='name')

In [130]: newdf
Out[130]:
    name    action_1  action_2  action_3 last_referred
0   bill    referred  referred       NaN      action_2
1    bob  introduced  referred  referred      action_3
2   mary  introduced       NaN       NaN           NaN
3   june  introduced  referred       NaN      action_2
4   dale    referred       NaN       NaN      action_1
5  donna  introduced       NaN       NaN           NaN
于 2013-08-30T20:17:33.377 に答える
0

最大値の最初のインデックスを返す idxmax を使用することも、それ以外の場合は最初のインデックスを使用することもできます。これには「NA」列を追加する必要があるため、少し面倒です。

revcols = df.columns.values.tolist()
revcols.reverse()
tmpdf = df=='referred'
tmpdf['NA'] = False
lastrefer = tmpdf[['NA']+revcols].idxmax(axis=1)
于 2013-08-30T20:20:45.827 に答える