Pandas の qcut を使用して、機械学習アルゴリズム用にデータを適切に準備しています。価格のある製品があり、次のコードでデータを同じサイズのバケットに離散化しました:
df['PriceBucket'] = pd.qcut(df['sell_prix'].sort_values(), 10, labels=False)
そして、このコードは私のラベルに関する詳細を持っています:
df['PriceBucketTitle'] = pd.qcut(df['sell_prix'].sort_values(), 10)
以下に示すように、PriceBucket と PriceBucketTitle があり、完璧です! 今、考慮される要素の数が必要です。このコードは NaN 値を返します (以下を参照)。
df['products_by_number'] = pd.qcut(df['sell_prix'], 10, labels=False).value_counts()
PriceBucket で grouby を実行すれば実現可能かもしれませんが、データ形式を維持したいと考えています。これは結果です:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] NaN
4669 8.0 2 (6.5, 8.5] NaN
4670 8.0 2 (6.5, 8.5] NaN
4671 8.0 2 (6.5, 8.5] NaN
4672 8.0 2 (6.5, 8.5] NaN
4673 8.0 2 (6.5, 8.5] NaN
4674 8.0 2 (6.5, 8.5] NaN
4675 8.0 2 (6.5, 8.5] NaN
4676 8.0 2 (6.5, 8.5] NaN
4677 8.0 2 (6.5, 8.5] NaN
11902 15.0 5 (12.9, 15] NaN
11903 15.0 5 (12.9, 15] NaN
11904 15.0 5 (12.9, 15] NaN
11905 15.0 5 (12.9, 15] NaN
11906 15.0 5 (12.9, 15] NaN
11907 15.0 5 (12.9, 15] NaN
11908 15.0 5 (12.9, 15] NaN
11909 15.0 5 (12.9, 15] NaN
11910 15.0 5 (12.9, 15] NaN
11911 15.0 5 (12.9, 15] NaN
12065 11.0 4 (10, 12.9] NaN
12066 11.0 4 (10, 12.9] NaN
たとえば、これは私が欲しいものです:
sell_prix PriceBucket PriceBucketTitle products_by_number
4668 8.0 2 (6.5, 8.5] 984546.0
4669 8.0 2 (6.5, 8.5] 984546.0
4670 8.0 2 (6.5, 8.5] 984546.0
4671 8.0 2 (6.5, 8.5] 984546.0
4672 8.0 2 (6.5, 8.5] 984546.0
4673 8.0 2 (6.5, 8.5] 984546.0
4674 8.0 2 (6.5, 8.5] 984546.0
4675 8.0 2 (6.5, 8.5] 984546.0
4676 8.0 2 (6.5, 8.5] 984546.0
4677 8.0 2 (6.5, 8.5] 984546.0
11902 15.0 5 (12.9, 15] 1028141.0
11903 15.0 5 (12.9, 15] 1028141.0
11904 15.0 5 (12.9, 15] 1028141.0
11905 15.0 5 (12.9, 15] 1028141.0
11906 15.0 5 (12.9, 15] 1028141.0
11907 15.0 5 (12.9, 15] 1028141.0
11908 15.0 5 (12.9, 15] 1028141.0
11909 15.0 5 (12.9, 15] 1028141.0
11910 15.0 5 (12.9, 15] 1028141.0
11911 15.0 5 (12.9, 15] 1028141.0
12065 11.0 4 (10, 12.9] 48998.0
12066 11.0 4 (10, 12.9] 48998.0
ヘルプ ?ありがとう!