python - 部分文字列基準で pandas DataFrame をフィルタリングする

Question

文字列値の列を持つ pandas DataFrame があります。部分的な文字列の一致に基づいて行を選択する必要があります。

このイディオムのようなもの：

re.search(pattern, cell_in_question)

ブール値を返します。私はの構文に精通していますがdf[df['A'] == "hello world"]、部分的な文字列の一致で同じことを行う方法を見つけることができないようです'hello'.

score 1140 · Accepted Answer

githubの問題＃620に基づくと、間もなく次のことができるようになります。

df[df['A'].str.contains("hello")]

更新：ベクトル化された文字列メソッド（つまり、Series.str）は、pandas0.8.1以降で使用できます。

score 29 · Accepted Answer

簡単な注意: インデックスに含まれる部分的な文字列に基づいて選択を行いたい場合は、次のことを試してください。

df['stridx']=df.index
df[df['stridx'].str.contains("Hello|Britain")]

score 22 · Accepted Answer

次があるとしますDataFrame。

>>> df = pd.DataFrame([['hello', 'hello world'], ['abcd', 'defg']], columns=['a','b'])
>>> df
       a            b
0  hello  hello world
1   abcd         defg

inラムダ式で演算子をいつでも使用して、フィルターを作成できます。

>>> df.apply(lambda x: x['a'] in x['b'], axis=1)
0     True
1    False
dtype: bool

ここでの秘訣は、列ごとではなく行ごとにラムダ関数に要素を渡すaxis=1オプションを使用することです。apply

score 14 · Accepted Answer

それらを次のような文字列と見なしてみることができます：

df[df['A'].astype(str).str.contains("Hello|Britain")]

score 6 · Accepted Answer

これが、部分的な文字列の一致に対して私がやったことです。誰かがこれを行うより効率的な方法を持っている場合は、私に知らせてください。

def stringSearchColumn_DataFrame(df, colName, regex):
    newdf = DataFrame()
    for idx, record in df[colName].iteritems():

        if re.search(regex, record):
            newdf = concat([df[df[colName] == record], newdf], ignore_index=True)

    return newdf

score 5 · Accepted Answer

特殊文字を含む文字列では、contains を使用してもうまくいきませんでした。しかし、うまくいきました。

df[df['A'].str.find("hello") != -1]

python - 部分文字列基準で pandas DataFrame をフィルタリングする

16 に答える 16

Related

Reference