python - Pandas DataFrame 列で複数の値の文字列を処理する

Question

列の 1 つ (質問) に複数の可能な回答があるアンケートデータセットがあります。その列のデータはリストの文字列であり、0 から 5 までの複数の可能な値、つまり'[1]'または'[1, 2, 3, 5]'

次のように、その列を処理して値に個別にアクセスしようとしています。

def f(x):
        if notnull(x):
            p = re.compile( '[\[\]\'\s]' )
            places = p.sub( '', x ).split( ',' )
            place_tally = {'1':0, '2':0, '3':0, '4':0, '5':0}
            for place in places:
                place_tally[place] += 1
            return place_tally

df['places'] = df.where_buy.map(f)

これにより、データフレームの「場所」に新しい列が作成され、値からの辞書が作成されます。つまり、{'1': 1, '3': 0, '2': 0, '5': 0, '4': 0}または{'1': 1, '3': 1, '2': 1, '5': 1, '4': 0}

新しい列からそのデータを抽出する最も効率的/簡潔な方法は何ですか? DataFrame を繰り返してみましたが、良い結果は得られませんでした。

    for row_index, row in df.iterrows():
         r = row['places']
         if r is not None:
             df.ix[row_index]['large_super'] = r['1']
             df.ix[row_index]['small_super'] = r['2']

これは機能していないようです。

ありがとう。

score 0 · Accepted Answer

これはあなたが意図していることですか？

for i in range(1,6):
    df['super_'+str(i)] = df['place'].map(lambda x: x.count(str(i)) )

python - Pandas DataFrame 列で複数の値の文字列を処理する

1 に答える 1

Related

Reference