r - エラーバーを計算してggplot2ヒストグラムに追加する良い方法は何ですか?

Question

次のコマンドは、単純なヒストグラムを生成します。

g<- ggplot(data = mtcars, aes(x = factor(carb) )) + geom_histogram()

通常、次のようにエラーバーをプロットに追加します。

g+stat_summary(fun.data="mean_cl_boot",geom="errorbar",conf.int=.95)

しかし、それはヒストグラムでは機能しません (「エラー: geom_errorbar には、次の欠落している美学が必要です: ymin、ymax」)、y 変数が明示的ではないためだと思います-カウントは geom_histogram によって自動的に計算されるため、 y 変数。

geom_histogram を使用できず、代わりに最初に y 量 (カウント) を自分で計算してから、geom_bar を呼び出して y 変数として指定する必要がありますか?

score 2 · Accepted Answer

確かに geom_histogram を使用できないようで、代わりにカウント (バーの高さ) と信頼区間の限界を手動で計算する必要があります。まず、カウントを計算するには:

library(plyr)
mtcars_counts <- ddply(mtcars, .(carb), function(x) data.frame(count=nrow(x)))

残りの問題は、二項比率の信頼区間を計算することです。ここでは、カウントをデータセット内のケースの総数で割ったものです。文献ではさまざまな式が提案されています。ここでは、PropCIs ライブラリに実装されている Agresti & Coull (1998) メソッドを使用します。

library(PropCIs)
numTotTrials <- sum(mtcars_counts$count)

# Create a CI function for use with ddply and based on our total number of cases.
makeAdd4CIforThisHist <- function(totNumCases,conf.int) {
  add4CIforThisHist <- function(df) {
     CIstuff<- add4ci(df$count,totNumCases,conf.int)
     data.frame( ymin= totNumCases*CIstuff$conf.int[1], ymax = totNumCases*CIstuff$conf.int[2] ) 
  }
  return (add4CIforThisHist)
}

calcCI <- makeAdd4CIforThisHist(numTotTrials,.95)

limits<- ddply(mtcars_counts,.(carb),calcCI) #calculate the CI min,max for each bar

mtcars_counts <- merge(mtcars_counts,limits) #combine the counts dataframe with the CIs

g<-ggplot(data =mtcars_counts, aes(x=carb,y=count,ymin=ymin,ymax=ymax)) + geom_bar(stat="identity",fill="grey")
g+geom_errorbar()

結果のグラフ

score 1 · Accepted Answer

あなたがやりたいことが統計的に有効かどうかはわかりません。

たとえば、集計 (bin/compute) を手動で実行するとNA、upper と Lower が取得されます。

mtcars$carb_bin <- factor(cut(mtcars$cyl,8,labels=FALSE))
library(plyr)
mtcars_sum <- ddply(mtcars, "carb_bin", 
                 function(x)smean.cl.boot(length(x$carb)))
mtcars_sum
  carb_bin Mean Lower Upper
1        1   11    NA    NA
2        4    7    NA    NA
3        8   14    NA    NA

また、だけを計算しyてこれggplot2を plotgeom_barおよびerror_barに渡しても、上限と下限が明確に定義されていないため、error_bar は得られません。

mtcars_sum <- ddply(mtcars, "carb_bin", summarise,
                    y = length(carb))

ggplot(data = mtcars_sum, aes(x=carb_bin,y=y)) + 
  geom_bar(stat='identity',alpha=0.2)+
  stat_summary(fun.data="mean_cl_normal",col='red',
               conf.int=.95,geom='pointrange')

ここに画像の説明を入力

r - エラーバーを計算してggplot2ヒストグラムに追加する良い方法は何ですか?

2 に答える 2

Related

Reference