pandas - pandas: カテゴリ列の行方向の最大値を計算します

Question

(同じカテゴリの)順序付けられたカテゴリデータの 2 つの列を含む DataFrame があります。最初の 2 列のカテゴリ最大値を含む別の列を作成したいと考えています。以下を設定しました。

import pandas as pd
from pandas.api.types import CategoricalDtype
import numpy as np

cats = CategoricalDtype(categories=['small', 'normal', 'large'], ordered=True)
data = {
    'A': ['normal', 'small', 'normal', 'large', np.nan],
    'B': ['small', 'normal', 'large', np.nan, 'small'],
    'desired max(A,B)': ['normal', 'normal', 'large', 'large', 'small']
}
df = pd.DataFrame(data).astype(cats)

次のコードを実行すると、np.nan 項目に問題がありますが、列を比較できます。

df['A'] > df['B']

マニュアルでは、 max() がカテゴリデータで機能することを示唆しているため、次のように新しい列を定義しようとしています。

df[['A', 'B']].max(axis=1)

これにより、NaN の列が生成されます。なんで？

score 0 · Accepted Answer

列 A と B は文字列型です。Max には ['small', 'medium', 'large'] の中でどれが一番大きいかわかりません。したがって、最初にこれらの各カテゴリに整数値を割り当てる必要があります。

# size string -> integer value mapping
size2int_map = {
    'small': 0, 
    'normal': 1, 
    'large': 2
}

# integer value -> size string mapping
int2size_map = {
    0: 'small', 
    1: 'normal', 
    2: 'large'
}

# create columns containing the integer value for each size string
for c in df:
    df['%s_int' % c] = df[c].map(size2int_map)

# apply the int2size map back to get the string sizes back
print(df[['A_int', 'B_int']].max(axis=1).map(int2size_map))

そして、あなたは得るべきです

0    normal
1    normal
2     large
3     large
4     small
dtype: object

pandas - pandas: カテゴリ列の行方向の最大値を計算します

2 に答える 2

Related

Reference