r - 生データと集計データから同じグラフを作成する

Question

生データと要約データからの同じプロット

次のデータ構造の場合

dsN<-data.frame(
  id=rep(1:100, each=4),
  yearF=factor(rep(2001:2004, 100)),
  attendF=sample(1:8, 400, T, c(.2,.2,.15,.10,.10, .20, .15, .02))
)
dsN[sample(which(dsN$yearF==2001), 5), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2002), 10), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2003), 15), "attendF"]<-NA
dsN[sample(which(dsN$yearF==2004), 20), "attendF"]<-NA

attcol8<-c("Never"="#4575b4",
           "Once or Twice"="#74add1",
           "Less than once/month"="#abd9e9",
           "About once/month"="#e0f3f8",
           "About twice/month"="#fee090",
           "About once/week"="#fdae61",
           "Several times/week"="#f46d43",
           "Everyday"="#d73027")
dsN$attendF<-factor(dsN$attendF, levels=1:8, labels=names(attcol8))
head(dsN,13)

   id yearF              attendF
1   1  2001      About once/week
2   1  2002     About once/month
3   1  2003      About once/week
4   1  2004                 <NA>
5   2  2001 Less than once/month
6   2  2002      About once/week
7   2  2003      About once/week
8   2  2004   Several times/week
9   3  2001        Once or Twice
10  3  2002      About once/week
11  3  2003                 <NA>
12  3  2004        Once or Twice
13  4  2001   Several times/week

一連の積み上げ棒グラフを取得できます

require(ggplot2)
# p<- ggplot( subset(dsN,!is.na(attendF)), aes(x=yearF, fill=attendF)) # leaving NA out of
p<- ggplot( dsN, aes(x=yearF, fill=attendF))  #  keeping NA in calculations
p<- p+ geom_bar(position="fill")
p<- p+ scale_fill_manual(values = attcol8,
                         name="Response category" )
p<- p+ scale_y_continuous("Prevalence: proportion of total",
                          limits=c(0, 1),
                          breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<- p+ scale_x_discrete("Waves of measurement",
                        limits=as.character(c(2000:2005)))
p<- p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p

ここに画像の説明を入力

上のグラフは生データから作成されたものです。ただし、特に統計関数を制御する必要がある場合は、要約されたデータからグラフを作成すると便利な場合があります。以下は、上のグラフに実際にマッピングされた値のみを含む dsN から ds への変換です。

require(dplyr)
ds<- dsN %.%
  dplyr::filter(!is.na(attendF)) %.%
  dplyr::group_by(yearF,attendF) %.%
  dplyr::summarize(count = sum(attendF)) %.%
  dplyr::mutate(total = sum(count),
              percent= count/total)
head(ds,10)

    Source: local data frame [10 x 5]
    Groups: yearF

       yearF              attendF count total percent
    1   2001                Never    18   373 0.04826
    2   2001        Once or Twice    36   373 0.09651
    3   2001 Less than once/month    30   373 0.08043
    4   2001     About once/month    32   373 0.08579
    5   2001    About twice/month    40   373 0.10724
    6   2001      About once/week    90   373 0.24129
    7   2001   Several times/week   119   373 0.31903
    8   2001             Everyday     8   373 0.02145
    9   2002                Never    11   355 0.03099
    10  2002        Once or Twice    44   355 0.12394

# verify
summarize(filter(ds, yearF==2001), should.be.one=sum(percent))
```

    Source: local data frame [1 x 2]

      yearF should.be.one
    1  2001             1

質問：

この要約データセットを使用して、上記のグラフをどのように再作成します dsか?

score 0 · Accepted Answer

コメント 1 からの回答

@MrFlick さんの指摘の通り、summary() の計算式に誤りがありました。ただし、合計の計算に欠損値を残すかどうかは、意味のある研究上の決定です。NA合計の計算を残したい場合:

ds<- dsN %.%
#   dplyr::filter(!is.na(attendF)) %.%   # comment out to count NA in the total
  dplyr::group_by(yearF,attendF) %.%
  dplyr::summarize(count = length( attendF)) %.%
  dplyr::mutate(total = sum(count),
              percent= count/total)
head(ds,10)

     Source: local data frame [10 x 5]
    Groups: yearF

       yearF              attendF count total percent
    1   2001                Never    23   100    0.23
    2   2001        Once or Twice     9   100    0.09
    3   2001 Less than once/month    16   100    0.16
    4   2001     About once/month    11   100    0.11
    5   2001    About twice/month     3   100    0.03
    6   2001      About once/week    21   100    0.21
    7   2001   Several times/week     9   100    0.09
    8   2001             Everyday     3   100    0.03
    9   2001                   NA     5   100    0.05
    10  2002                Never    17   100    0.17

欠損値は、研究における自然な減少を示すために、合計応答の計算に使用されます。

p<- ggplot( ds, aes(x=yearF, y=percent, fill=attendF))  #  keeping NA in calculations
p<- p+ geom_bar(position="stack", stat="identity")
p<- p+ scale_fill_manual(values = attcol8,
                         name="Response category" )
p<- p+ scale_y_continuous("Prevalence: proportion of total",
                          limits=c(0, 1),
                          breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<- p+ scale_x_discrete("Waves of measurement",
                        limits=as.character(c(2000:2005)))
p<- p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p

ここに画像の説明を入力

ただし、消耗が結果の測定値と有意に関連していないと仮定すると、応答の支持の相対的な有病率が時間の経過とともにどのように変化するか、またはおそらく平衡にとどまるかを確認することは興味深いでしょう. このために、応答の合計の計算から欠損値を削除する必要があります。

ds<- dsN %.%
  dplyr::filter(!is.na(attendF)) %.%   # comment out to count NA in the total
  dplyr::group_by(yearF,attendF) %.%
  dplyr::summarize(count = length( attendF)) %.%
  dplyr::mutate(total = sum(count),
              percent= count/total)
head(ds,10)

    Source: local data frame [10 x 5]
    Groups: yearF

       yearF              attendF count total percent
    1   2001                Never    23    95 0.24211
    2   2001        Once or Twice     9    95 0.09474
    3   2001 Less than once/month    16    95 0.16842
    4   2001     About once/month    11    95 0.11579
    5   2001    About twice/month     3    95 0.03158
    6   2001      About once/week    21    95 0.22105
    7   2001   Several times/week     9    95 0.09474
    8   2001             Everyday     3    95 0.03158
    9   2002                Never    17    90 0.18889
    10  2002        Once or Twice    23    90 0.25556

グラフはこれを反映しています。

p<- ggplot( ds, aes(x=yearF, y=percent, fill=attendF))  #  keeping NA in calculations
p<- p+ geom_bar(position="stack", stat="identity")
p<- p+ scale_fill_manual(values = attcol8,
                         name="Response category" )
p<- p+ scale_y_continuous("Prevalence: proportion of total",
                          limits=c(0, 1),
                          breaks=c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1))
p<- p+ scale_x_discrete("Waves of measurement",
                        limits=as.character(c(2000:2005)))
p<- p+ labs(title=paste0("In the past year, how often have you attended a worship service?"))
p

ここに画像の説明を入力

ありがとう、@MrFlick!

r - 生データと集計データから同じグラフを作成する

生データと要約データからの同じプロット

質問：

2 に答える 2

コメント 1 からの回答

Related

Reference