python - パンダの行を正規表現でフィルタリングする方法

Question

列の1つで正規表現を使用して、データフレームをきれいにフィルタリングしたいと思います。

不自然な例:

In [210]: foo = pd.DataFrame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
In [211]: foo
Out[211]: 
   a    b
0  1   hi
1  2  foo
2  3  fat
3  4  cat

f正規表現を使用して始まる行に行をフィルター処理したいと考えています。最初に行く：

In [213]: foo.b.str.match('f.*')
Out[213]: 
0    []
1    ()
2    ()
3    []

それはあまり役に立ちません。ただし、これによりブール値のインデックスが取得されます。

In [226]: foo.b.str.match('(f.*)').str.len() > 0
Out[226]: 
0    False
1     True
2     True
3    False
Name: b

したがって、次の方法で制限を行うことができます。

In [229]: foo[foo.b.str.match('(f.*)').str.len() > 0]
Out[229]: 
   a    b
1  2  foo
2  3  fat

ただし、これにより、人為的にグループを正規表現に入れることができますが、おそらくクリーンな方法ではないようです。これを行うより良い方法はありますか？

score 240 · Accepted Answer

代わりに次の内容を使用してください:

In [10]: df.b.str.contains('^f')
Out[10]: 
0    False
1     True
2     True
3    False
Name: b, dtype: bool

score 23 · Accepted Answer

It may be a bit late, but this is now easier to do in Pandas by calling Series.str.match. The docs explain the difference between match, fullmatch and contains.

Note that in order to use the results for indexing, set the na=False argument (or True if you want to include NANs in the results).

python - パンダの行を正規表現でフィルタリングする方法

8 に答える 8

Related

Reference