python - Pandas Groupy は最初の N グループのみを取得します

Question

ID でグループ化したい DataFrame があります。

import pandas as pd
df = pd.DataFrame({'item_id': ['a', 'a', 'b', 'b', 'b', 'c', 'd'], 'user_id': [1,2,1,1,3,1,5]})
print df

生成するもの:

  item_id  user_id
0       a        1
1       a        2
2       b        1
3       b        1
4       b        3
5       c        1
6       d        5

[7 rows x 2 columns]

ID で簡単にグループ化できます。

grouped = df.groupby("item_id")

しかし、最初の N 個の group-by オブジェクトのみを返すにはどうすればよいでしょうか? 例）最初の 3 つの固有の item_id だけが必要です。

score 4 · Accepted Answer

1 つの方法はCounter、リストから上位 3 つの一意の項目を取得し、それらの項目に基づいて DataFrame をフィルター処理してから、このフィルター処理された DataFrame に対して groupby 操作を実行することです。

from collections import Counter

c = Counter(df.item_id)
most_common = [item for item, _ in c.most_common(3)]

>>> df[df.item_id.isin(most_common)].groupby('item_id').sum()
         user_id
item_id         
a              3
b              5
c              1

python - Pandas Groupy は最初の N グループのみを取得します

2 に答える 2

Related

Reference