r - 複数の基準に基づいて全体の割合をグラフ化する

Question

更新: 誰かが疑問に思っている場合は、両方の答えが機能します。どちらも、Excel で Sumif をエミュレートする場合に作成するのと同じように、集計テーブルを作成します。これはまさに私が探していたものです。お二方に改めて感謝申し上げます。

このようなデータフレーム (df) がありますが、より多くの製品があります。 df$Yrカットオフ日 >= 2012 年 3 月に基づいています

Product      Classif         Yr     Revenue
a            paid_yes      TRUE     25
a            paid_yes      TRUE     20
a            paid_yes      TRUE     35
a            paid_yes      FALSE    20
a            paid_yes      FALSE    30
a            paid_yes      FALSE    30
a            paid_partial  TRUE     15
a            paid_partial  TRUE     15
a            paid_partial  FALSE    18
a            leased        TRUE     12
a            leased        TRUE     12
a            leased        FALSE    14
a            Other         TRUE     27
a            Other         FALSE    30
a            Other         TRUE     25
a            Other         FALSE    22
a            Other         TRUE     32
a            Other         FALSE    30
a            Other         TRUE     24
a            Other         FALSE    27
b            paid_yes      TRUE     45
b            paid_yes      FALSE    32
b            paid_yes      TRUE     35
b            paid_yes      FALSE    39
b            paid_partial  FALSE    42
b            paid_partial  FALSE    45
b            paid_partial  TRUE     47
b            paid_partial  FALSE    33
b            paid_partial  FALSE    28
b            leased        TRUE     48
b            leased        FALSE    46
b            leased        FALSE    45
b            leased        TRUE     37
b            leased        FALSE    33
b            leased        TRUE     46
b            leased        FALSE    44
b            Other         TRUE     49
b            Other         FALSE    45
b            Other         TRUE     43
b            Other         FALSE    39

製品別（a、b、cなど）のファセット散布図を作成しようとしています。y 軸を、x 軸を、各内df$Classifの合計のパーセンテージにしたい。言い換えれば、特定の年の製品の総収益の何パーセントを各分類が占めているのでしょうか?RevenueProductYr

要約フレームを次のようにしたい...

Product      Classif         Yr     perc.rev
a            paid_yes      TRUE     .332
a            paid_partial  TRUE     .123
a            leased        TRUE     .099
a            Other         TRUE     .446

各 perc.rev は、Product、Classif、およびYr

次のコードを使用して、集計データセット/列を取得しようとしました。

df.perc <- ddply(df, .(Product, Classif, Yr), summarise,
               perc.rev = sum(Revenue)/count(Classif))

結果のデータフレームから、、、による平均収益が得られます。私が必要としているのは、与えられたによって生み出された収益のパーセンテージです。ProductClassifYrClassifClassifProductYear

perc.rev 式、.variablesまたはddply. 私は Excel に慣れており、通常は 2 つの sumifs 式を使用しますが、ここで行う必要があることを R 関数で表現する方法がわかりません。

score 2 · Accepted Answer

私は初めてなplyrので、もっとエレガントな解決策があるかもしれません。(Product, Yr)まず、各組み合わせの合計数を保存します。次に実行しますddply：

counts <- ddply(df, .(Product, Yr), summarise, count=sum(Revenue))
ddply(df, .(Product, Classif, Yr), summarise,
  perc.rev=sum(Revenue)/counts$count[counts$Product==Product[1] & counts$Yr==Yr[1]])

どちらが与える

   Product      Classif    Yr   perc.rev
1        a       leased FALSE 0.06334842
2        a       leased  TRUE 0.09917355
3        a        Other FALSE 0.49321267
4        a        Other  TRUE 0.44628099
5        a paid_partial FALSE 0.08144796
6        a paid_partial  TRUE 0.12396694
7        a     paid_yes FALSE 0.36199095
8        a     paid_yes  TRUE 0.33057851
9        b       leased FALSE 0.35668790
10       b       leased  TRUE 0.37428571
11       b        Other FALSE 0.17834395
12       b        Other  TRUE 0.26285714
13       b paid_partial FALSE 0.31422505
14       b paid_partial  TRUE 0.13428571
15       b     paid_yes FALSE 0.15074310
16       b     paid_yes  TRUE 0.22857143

score 1 · Accepted Answer

ave(...,...,sum)を使用して副産物の「合計」を追加し、次に分類ごとのパーセンテージを追加する2パスプロセスを実行しないのはなぜですか

<strike>apply( ..., ..., function(x) x["Classif"]/x['total"] )<\strike>

編集：（これがどのようにチェックに値するかはわかりませんでしたが、修正しようとします）その2番目の部分は不可解すぎて、おそらく単に間違っていました. x["Classif"] を x["Revenue"] に変更することは潜在的に修正可能だったかもしれませんapplyが、完全に間違った機能だったと思います。

リクエストは、「特定の年に与えられた製品の総収益の何パーセントが、各分類を占めているか」...および「製品、分類、および年が与えられた場合、各 perc.rev の合計が 100% になる場所」に関するものでした。ここで明らかに、出力は、少なくともその 2 番目の部分が「製品と年が与えられると、各 perc.rev が 100% になる場所」を読む必要があることを暗示していました。(Classif を省略)。

dfrm$total <- ave(dfrm$Revenue, dfrm$Product, dfrm$Yr, FUN=sum)
dfrm$prod.yr.prop <- dfrm$Revenue/dfrm$total
aggregate(dfrm$prod.yr.prop, list(class=dfrm$Classif, Yr=dfrm$Yr, Prod=dfrm$Product), FUN=sum)
          class    Yr Prod          x
1        leased FALSE    a 0.06334842
2         Other FALSE    a 0.49321267
3  paid_partial FALSE    a 0.08144796
4      paid_yes FALSE    a 0.36199095
5        leased  TRUE    a 0.09917355
6         Other  TRUE    a 0.44628099
7  paid_partial  TRUE    a 0.12396694
8      paid_yes  TRUE    a 0.33057851
9        leased FALSE    b 0.35668790
10        Other FALSE    b 0.17834395
11 paid_partial FALSE    b 0.31422505
12     paid_yes FALSE    b 0.15074310
13       leased  TRUE    b 0.37428571
14        Other  TRUE    b 0.26285714
15 paid_partial  TRUE    b 0.13428571
16     paid_yes  TRUE    b 0.22857143

これは製品年内の合計を行い、それらのグループ内で Classif 固有の割合を計算します。

r - 複数の基準に基づいて全体の割合をグラフ化する

2 に答える 2

Related

Reference