r - ggplot2から複数の変数を使用して積み上げ棒グラフをより適切に作成するにはどうすればよいですか？

Question

変数を比較するためにスタックバープロットを作成する必要があることがよくあります。すべての統計をRで行うため、すべてのグラフィックをRでggplot2を使用して行うことを好みます。私は2つのことをする方法を学びたいです：

まず、カウントごとの目盛りではなく、変数ごとに適切なパーセンテージの目盛りを追加できるようにしたいと思います。カウントが混乱するので、軸ラベルを完全に削除します。

次に、これを実現するためにデータを再編成するためのより簡単な方法が必要です。plyRを使用してggplot2でネイティブに実行できるはずのようなもののようですが、plyRのドキュメントはあまり明確ではありません（ggplot2の本とオンラインのplyRのドキュメントの両方を読んだことがあります。

私の最高のグラフは次のようになります。それを作成するためのコードは次のとおりです。

グラフの例

私がそれを取得するために使用するRコードは次のとおりです。

library(epicalc)  

### recode the variables to factors ###
recode(c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ), c(1,2,3,4,5,6,7,8,9, NA), 
c('Very Interested','Somewhat Interested','Not Very Interested','Not At All interested',NA,NA,NA,NA,NA,NA))

### Combine recoded variables to a common vector
Interest1<-c(int_newcoun, int_newneigh, int_neweur, int_newusa, int_neweco, int_newit, int_newen, int_newsp, int_newhr, int_newlit, int_newent, int_newrel, int_newhth, int_bapo, int_wopo, int_eupo, int_educ)


### Create a second vector to label the first vector by original variable ###  
a1<-rep("News about Bangladesh", length(int_newcoun))
a2<-rep("Neighboring Countries", length(int_newneigh))
[...]
a17<-rep("Education", length(int_educ))


Interest2<-c(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15, a16, a17)

### Create a Weighting vector of the proper length ###
Interest.weight<-rep(weight, 17)

### Make and save a new data frame from the three vectors ###
Interest.df<-cbind(Interest1, Interest2, Interest.weight)
Interest.df<-as.data.frame(Interest.df)

write.csv(Interest.df, 'C:\\Documents and Settings\\[name]\\Desktop\\Sweave\\InterestBangladesh.csv')

### Sort the factor levels to display properly ###

Interest.df$Interest1<-relevel(Interest$Interest1, ref='Not Very Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Somewhat Interested')
Interest.df$Interest1<-relevel(Interest$Interest1, ref='Very Interested')

Interest.df$Interest2<-relevel(Interest$Interest2, ref='News about Bangladesh')
Interest.df$Interest2<-relevel(Interest$Interest2, ref='Education')
[...]
Interest.df$Interest2<-relevel(Interest$Interest2, ref='European Politics')

detach(Interest)
attach(Interest)

### Finally create the graph in ggplot2 ###

library(ggplot2)
p<-ggplot(Interest, aes(Interest2, ..count..))
p<-p+geom_bar((aes(weight=Interest.weight, fill=Interest1)))
p<-p+coord_flip()
p<-p+scale_y_continuous("", breaks=NA)
p<-p+scale_fill_manual(value = rev(brewer.pal(5, "Purples")))
p
update_labels(p, list(fill='', x='', y=''))

ヒント、コツ、ヒントをいただければ幸いです。

score 2 · Accepted Answer

2 番目の問題は、reshape パッケージのメルトアンドキャストで解決できます。

呼び出された data.frame 内の要素を因数分解した後、次のようなものを使用できます。

install.packages("reshape")
library(reshape)

x <- melt(your.df, c()) ## Assume you have some kind of data.frame of all factors
x <- na.omit(x) ## Be careful, sometimes removing NA can mess with your frequency calculations

x <- cast(x, variable + value ~., length)
colnames(x) <- c("variable","value","freq")
## Presto!
ggplot(x, aes(variable, freq, fill = value)) + geom_bar(position = "fill") + coord_flip() + scale_y_continuous("", formatter="percent")

余談ですが、乱雑なインポートから列を取得するために grep を使用するのが好きです。例えば：

x <- your.df[,grep("int.",df)] ## pulls all columns starting with "int_"

また、 c(' ', ...) を何百万回も入力する必要がない場合、因数分解はより簡単になります。

for(x in 1:ncol(x)) { 
df[,x] <- factor(df[,x], labels = strsplit('
Very Interested
Somewhat Interested
Not Very Interested
Not At All interested
NA
NA
NA
NA
NA
NA
', '\n')[[1]][-1]
}

score 2 · Accepted Answer

prop.tables100% 積み上げ棒グラフを作成するのに、必要もカウントも必要ありません。あなただけが必要です+geom_bar(position="stack")

score 1 · Accepted Answer

最初の質問: これは役に立ちますか?

geom_bar(aes(y=..count../sum(..count..)))

2 番目の質問です。バーを並べ替えるために並べ替えを使用できますか? 何かのようなもの

aes(reorder(Interest, Value, mean), Value)

(7 時間のドライブから戻ったばかりです - 疲れています - でもきっとうまくいくはずです)

score 1 · Accepted Answer

の割合については..count..、次を試してください。

ggplot(mtcars, aes(factor(cyl), prop.table(..count..) * 100)) + geom_bar()

しかし、関数をに押し込むのは得策ではないため、aes()カスタム関数を記述してからパーセンテージを作成したり、小数..count..に丸めたりすることができます。n

あなたはこの投稿にというラベルを付けましたが、ここでは何も実行されplyrていません。オンラインドキュメントで十分です。plyrddply()plyr

score 1 · Accepted Answer

私が正しく理解している場合、軸のラベル付けの問題を修正するには、次の変更を行います。

# p<-ggplot(Interest, aes(Interest2, ..count..))
p<-ggplot(Interest, aes(Interest2, ..density..))

2 つ目については、 reshape パッケージを使用したほうがよいと思います。これを使用して、データを非常に簡単にグループに集約できます。

以下のaL3xaのコメントを参照して...

library(ggplot2)
r<-rnorm(1000)
d<-as.data.frame(cbind(r,1:1000))
ggplot(d,aes(r,..density..))+geom_bar()

戻り値...

代替テキスト http://www.drewconway.com/zia/wp-content/uploads/2010/04/density.png

ビンは密度になりました...

r - ggplot2から複数の変数を使用して積み上げ棒グラフをより適切に作成するにはどうすればよいですか？

5 に答える 5

Related

Reference