pandas - np.where（）を使用したパンダの列の構築

Question

私はPandasで割り当てを行っており、np.where（）を使用して、次の3つの値を使用してPandasDataFrameに列を追加します。

fips_df['geog_type'] = np.where(fips_df.fips.str[-3:] != '000', 'county', np.where(fips_df.fips.str[:] == '00000', 'country', 'state'))

列を追加した後のDataFrameの状態は次のようになります。

print fips_df[:5]

    fips         geog_entity fips_prefix geog_type
0  00000       UNITED STATES          00   country
1  01000             ALABAMA          01     state
2  01001  Autauga County, AL          01    county
3  01003  Baldwin County, AL          01    county
4  01005  Barbour County, AL          01    county

この列の構成は、2つのアサートによってテストされます。最初は合格し、2番目は失敗します。

## check the numbers of geog_type

assert set(fips_df['geog_type'].value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

assert set(fips_df.geog_type.value_counts().iteritems()) == set([('state', 51), ('country', 1), ('county', 3143)])

2番目のアサートが失敗する原因となるfips_df.geog_typeとfips_df['geog_type']として列を呼び出すことの違いは何ですか？

score 4 · Accepted Answer

念のため、はるかに少ない労力で新しい列を作成できます。例えば：

In [1]: import pandas as pd

In [2]: import numpy as np

In [3]: df = pd.DataFrame(np.random.uniform(size=10))

In [4]: df
Out[4]: 
          0
0  0.366489
1  0.697744
2  0.570066
3  0.756647
4  0.036149
5  0.817588
6  0.884244
7  0.741609
8  0.628303
9  0.642807

In [5]: categorize = lambda value: "ABC"[int(value > 0.3) + int(value > 0.6)]

In [6]: df["new_col"] = df[0].apply(categorize)

In [7]: df
Out[7]: 
          0 new_col
0  0.366489       B
1  0.697744       C
2  0.570066       B
3  0.756647       C
4  0.036149       A
5  0.817588       C
6  0.884244       C
7  0.741609       C
8  0.628303       C
9  0.642807       C

score 2 · Accepted Answer

それは同じはずです（そしてほとんどの場合）...

そうでない状況の 1 つは、その値で設定された属性またはメソッドが既にある場合です (この場合、オーバーライドされないため、ドット表記で列にアクセスできません)。

In [1]: df = pd.DataFrame([[1, 2] ,[3 ,4]])

In [2]: df.A = 7

In [3]: df.B = lambda: 42

In [4]: df.columns = list('AB')

In [5]: df.A
Out[5]: 7

In [6]: df.B()
Out[6]: 42

In [7]: df['A']
Out[7]: 
0    1
1    3
Name: A

興味深いことに、列にアクセスするためのドット表記は、選択構文では言及されていません。

pandas - np.where（）を使用したパンダの列の構築

2 に答える 2

Related

Reference