python - パンダでのテーブルの再形成

Question

以下は、複数のクエリログデータフレームをマージして作成したデータフレームの抜粋です。

                keyword               hits         date         average time
1               the cat sat on        10           10-Jan       10
2               who is the sea        5            10-Jan       1.2
3               under the earth       30           1-Dec        2.5
4               what is this          100          1-Feb        9

行が毎日の日付 (1 月 1 日、1 月 2 日など) になり、各日付に対応する 1 列が毎日のヒットの合計 (そのヒットの合計) になるように、Pandas を使用してデータをピボットする方法はありますか?日、例: 1 月 1 日までのヒット数の合計) を、その月の毎月のヒット数の合計 (例: 1 月全体) で割った値 (つまり、月ごとに正規化された日ごとのヒット率)

score 1 · Accepted Answer

日付を解析して、後で月を抽出できるようにします。

In [99]: df.date = df.date.apply(pd.Timestamp)

In [100]: df
Out[100]: 
           keyword  hits                date  average time
1   the cat sat on    10 2013-01-10 00:00:00          10.0
2   who is the sea     5 2013-01-10 00:00:00           1.2
3  under the earth    30 2013-12-01 00:00:00           2.5
4     what is this   100 2013-02-01 00:00:00           9.0

日ごとにグループ化し、ヒットを合計します。

In [101]: daily_totals = df.groupby('date').hits.sum()

In [102]: daily_totals
Out[102]: 
date
2013-01-10     15
2013-02-01    100
2013-12-01     30
Name: hits, dtype: int64

月ごとにグループ化し、各行 (それぞれの 1 日の合計) をその月のすべての 1 日の合計の合計で割ります。

In [103]: normalized_totals = daily_totals.groupby(lambda d: d.month).transform(lambda x: float(x)/x.sum())

In [104]: normalized_totals
Out[104]: 
date
2013-01-10    1
2013-02-01    1
2013-12-01    1
Name: hits, dtype: int64

あなたの簡単な例では、各月に 1 日しか与えられていないため、これらはすべて 1 です。

python - パンダでのテーブルの再形成

1 に答える 1

Related

Reference