python - 欠損値のグループ代入ごとのパンダ

Question

パンダの各指標について、このような国ごとの代入をどのように達成できますか?

グループごとに欠損値を代入したい

np.minインジケーターKPIごとに非A状態を取得する必要があります
no-ISO-statenp.meanは指標ごとの KPIを取得する必要があります
値が欠落している状態については、平均ごとに代入したいと思いindicatorKPIます。ここで、これはセルビアの欠損値を代入することを意味します

mydf = pd.DataFrame({'Country':['no-A-state','no-ISO-state','germany','serbia', 'austria', 'germany','serbia', 'austria' ',], 'indicatorKPI':[np.nan,np.nan,'SP.DYN.LE00.IN','NY.GDP.MKTP.CD','NY.GDP.MKTP.CD', 'SP. DYN.LE00.IN','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN'], '値':[np.nan,np.nan,0.9,np.nan,0.7, 0.2 、0.3、0.6]})

編集

目的の出力は次のようになります。

mydf = pd.DataFrame({'Country':['no-A-state','no-ISO-state', 'no-A-state','no-ISO-state',
                                'germany','serbia','serbia', 'austria', 
                                'germany','serbia', 'austria',],
                   'indicatorKPI':['SP.DYN.LE00.IN','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN',
                                   'SP.DYN.LE00.IN','NY.GDP.MKTP.CD','SP.DYN.LE00.IN','NY.GDP.MKTP.CD','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN','NY.GDP.MKTP.CD', 'SP.DYN.LE00.IN'],
                     'value':['MIN of all for this indicator', 'MEAN of all for this indicator','MIN of all for this indicator','MEAN of all for this indicator', 0.9,'MEAN of all for SP.DYN.LE00.IN indicator',0.7, 'MEAN of all for NY.GDP.MKTP.CD indicator',0.2, 0.3, 0.6]
                   })

score 3 · Accepted Answer

あなたの新しい例 df に基づいて、次のことがうまくいきます：

In [185]:
mydf.loc[mydf['Country'] == 'no-A-state', 'value'] = mydf['value'].min()
mydf.loc[mydf['Country'] == 'no-ISO-state', 'value'] = mydf['value'].mean()
mydf.loc[mydf['value'].isnull(), 'value'] = mydf['indicatorKPI'].map(mydf.groupby('indicatorKPI')['value'].mean())
mydf

Out[185]:
         Country    indicatorKPI     value
0     no-A-state  SP.DYN.LE00.IN  0.200000
1   no-ISO-state  NY.GDP.MKTP.CD  0.442857
2     no-A-state  SP.DYN.LE00.IN  0.200000
3   no-ISO-state  SP.DYN.LE00.IN  0.442857
4        germany  NY.GDP.MKTP.CD  0.900000
5         serbia  SP.DYN.LE00.IN  0.328571
6         serbia  NY.GDP.MKTP.CD  0.700000
7        austria  NY.GDP.MKTP.CD  0.585714
8        germany  SP.DYN.LE00.IN  0.200000
9         serbia  NY.GDP.MKTP.CD  0.300000
10       austria  SP.DYN.LE00.IN  0.600000

基本的にこれが行うことは、各条件の欠損値を埋めることです。そのため、「非 A 州」の国に最小値を設定し、次に「非 ISO 州」の国に平均値を設定します。次に、「indicatorKPI」でグループ化し、各グループの平均を計算し、null 値の行に再度割り当てます。それぞれの国の平均を使用mapしてルックアップを実行します。

分解された手順は次のとおりです。

In [187]:
mydf.groupby('indicatorKPI')['value'].mean()

Out[187]:

indicatorKPI
NY.GDP.MKTP.CD    0.633333
SP.DYN.LE00.IN    0.400000
Name: value, dtype: float64

In [188]:
mydf['indicatorKPI'].map(mydf.groupby('indicatorKPI')['value'].mean())

Out[188]:
0     0.400000
1     0.633333
2     0.400000
3     0.400000
4     0.633333
5     0.400000
6     0.633333
7     0.633333
8     0.400000
9     0.633333
10    0.400000
Name: indicatorKPI, dtype: float64

python - 欠損値のグループ代入ごとのパンダ

パンダの各指標について、このような国ごとの代入をどのように達成できますか?

編集

1 に答える 1

Related

Reference