r - Rで2つのヒストグラムを一緒にプロットする方法は？

Question

私はRを使用しており、ニンジンとキュウリの2つのデータフレームがあります。各データフレームには、測定されたすべてのニンジン（合計：100kニンジン）とキュウリ（合計：50kキュウリ）の長さを一覧表示する単一の数値列があります。

ニンジンの長さとキュウリの長さの2つのヒストグラムを同じプロットにプロットしたいと思います。それらは重なっているので、私もある程度の透明性が必要だと思います。また、各グループのインスタンス数が異なるため、絶対数ではなく相対頻度を使用する必要があります。

このようなものがあればいいのですが、2つのテーブルから作成する方法がわかりません。

重なり密度

score 297 · Accepted Answer

ベースグラフィックスとアルファブレンディングを使用したさらに単純なソリューションを次に示します (これはすべてのグラフィックスデバイスで機能するわけではありません)。

set.seed(42)
p1 <- hist(rnorm(500,4))                     # centered at 4
p2 <- hist(rnorm(500,6))                     # centered at 6
plot( p1, col=rgb(0,0,1,1/4), xlim=c(0,10))  # first histogram
plot( p2, col=rgb(1,0,0,1/4), xlim=c(0,10), add=T)  # second

色が半透明なのがポイントです。

編集、2年以上後：これは賛成票を獲得したばかりなので、アルファブレンディングが非常に便利であるため、コードが生成するもののビジュアルを追加することもできると思います：

ここに画像の説明を入力

score 212 · Accepted Answer

リンクした画像は、ヒストグラムではなく密度曲線用でした。

ggplot を読んでいる場合、2 つのデータフレームを 1 つの長いデータフレームに結合することだけが欠けている可能性があります。

それでは、あなたが持っているもののようなものから始めましょう.2つの別々のデータセットを組み合わせてみましょう.

carrots <- data.frame(length = rnorm(100000, 6, 2))
cukes <- data.frame(length = rnorm(50000, 7, 2.5))

# Now, combine your two dataframes into one.  
# First make a new column in each that will be 
# a variable to identify where they came from later.
carrots$veg <- 'carrot'
cukes$veg <- 'cuke'

# and combine into your new data frame vegLengths
vegLengths <- rbind(carrots, cukes)

その後、データがすでに長い形式になっている場合は不要ですが、プロットを作成するために必要なのは 1 行だけです。

ggplot(vegLengths, aes(length, fill = veg)) + geom_density(alpha = 0.2)

ここに画像の説明を入力

さて、本当にヒストグラムが必要な場合は、次のように動作します。デフォルトの「スタック」引数から位置を変更する必要があることに注意してください。データがどのように見えるべきかを本当に理解していない場合、それを見逃す可能性があります。そこでは、アルファが高いほど見栄えがよくなります。また、密度ヒストグラムにしたことにも注意してください。y = ..density..カウントを元に戻すのは簡単です。

ggplot(vegLengths, aes(length, fill = veg)) + 
   geom_histogram(alpha = 0.5, aes(y = ..density..), position = 'identity')

ここに画像の説明を入力

score 45 · Accepted Answer

これは、重複するヒストグラムを表すために疑似透明度を使用する、私が作成した関数です。

plotOverlappingHist <- function(a, b, colors=c("white","gray20","gray50"),
                                breaks=NULL, xlim=NULL, ylim=NULL){

  ahist=NULL
  bhist=NULL

  if(!(is.null(breaks))){
    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  } else {
    ahist=hist(a,plot=F)
    bhist=hist(b,plot=F)

    dist = ahist$breaks[2]-ahist$breaks[1]
    breaks = seq(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks),dist)

    ahist=hist(a,breaks=breaks,plot=F)
    bhist=hist(b,breaks=breaks,plot=F)
  }

  if(is.null(xlim)){
    xlim = c(min(ahist$breaks,bhist$breaks),max(ahist$breaks,bhist$breaks))
  }

  if(is.null(ylim)){
    ylim = c(0,max(ahist$counts,bhist$counts))
  }

  overlap = ahist
  for(i in 1:length(overlap$counts)){
    if(ahist$counts[i] > 0 & bhist$counts[i] > 0){
      overlap$counts[i] = min(ahist$counts[i],bhist$counts[i])
    } else {
      overlap$counts[i] = 0
    }
  }

  plot(ahist, xlim=xlim, ylim=ylim, col=colors[1])
  plot(bhist, xlim=xlim, ylim=ylim, col=colors[2], add=T)
  plot(overlap, xlim=xlim, ylim=ylim, col=colors[3], add=T)
}

Rの透明色のサポートを使用してそれを行う別の方法があります

a=rnorm(1000, 3, 1)
b=rnorm(1000, 6, 1)
hist(a, xlim=c(0,10), col="red")
hist(b, add=T, col=rgb(0, 1, 0, 0.5) )

結果は次のようになります。代替テキスト

score 25 · Accepted Answer

「クラシック」R グラフィックスでそれを行う方法の例を次に示します。

## generate some random data
carrotLengths <- rnorm(1000,15,5)
cucumberLengths <- rnorm(200,20,7)
## calculate the histograms - don't plot yet
histCarrot <- hist(carrotLengths,plot = FALSE)
histCucumber <- hist(cucumberLengths,plot = FALSE)
## calculate the range of the graph
xlim <- range(histCucumber$breaks,histCarrot$breaks)
ylim <- range(0,histCucumber$density,
              histCarrot$density)
## plot the first graph
plot(histCarrot,xlim = xlim, ylim = ylim,
     col = rgb(1,0,0,0.4),xlab = 'Lengths',
     freq = FALSE, ## relative, not absolute frequency
     main = 'Distribution of carrots and cucumbers')
## plot the second graph on top of this
opar <- par(new = FALSE)
plot(histCucumber,xlim = xlim, ylim = ylim,
     xaxt = 'n', yaxt = 'n', ## don't add axes
     col = rgb(0,0,1,0.4), add = TRUE,
     freq = FALSE) ## relative, not absolute frequency
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
       fill = rgb(1:0,0,0:1,0.4), bty = 'n',
       border = NA)
par(opar)

これに関する唯一の問題は、ヒストグラムのブレークが整列されている場合、見栄えがはるかに良くなることです。これは、( に渡される引数でhist) 手動で行う必要がある場合があります。

score 23 · Accepted Answer

これは、ベースRでのみ提供したggplot2のようなバージョンです。@nullglobからいくつかをコピーしました。

データを生成する

carrots <- rnorm(100000,5,2)
cukes <- rnorm(50000,7,2.5)

ggplot2のようにデータフレームに入れる必要はありません。この方法の欠点は、プロットの詳細をもっと多く書き出さなければならないことです。利点は、プロットの詳細を制御できることです。

## calculate the density - don't plot yet
densCarrot <- density(carrots)
densCuke <- density(cukes)
## calculate the range of the graph
xlim <- range(densCuke$x,densCarrot$x)
ylim <- range(0,densCuke$y, densCarrot$y)
#pick the colours
carrotCol <- rgb(1,0,0,0.2)
cukeCol <- rgb(0,0,1,0.2)
## plot the carrots and set up most of the plot parameters
plot(densCarrot, xlim = xlim, ylim = ylim, xlab = 'Lengths',
     main = 'Distribution of carrots and cucumbers', 
     panel.first = grid())
#put our density plots in
polygon(densCarrot, density = -1, col = carrotCol)
polygon(densCuke, density = -1, col = cukeCol)
## add a legend in the corner
legend('topleft',c('Carrots','Cucumbers'),
       fill = c(carrotCol, cukeCol), bty = 'n',
       border = NA)

ここに画像の説明を入力してください

score 12 · Accepted Answer

@Dirk Eddelbuettel: 基本的なアイデアは優れていますが、示されているコードは改善することができます。[説明に時間がかかるため、コメントではなく別の回答です。]

関数はhist()デフォルトでプロットを描画するため、オプションを追加する必要がありplot=FALSEます。さらに、軸ラベル、プロットタイトルなどを追加できる呼び出しによってプロットエリアを確立する方が明確plot(0,0,type="n",...)です。最後に、シェーディングを使用して 2 つのヒストグラムを区別することもできます。コードは次のとおりです。

set.seed(42)
p1 <- hist(rnorm(500,4),plot=FALSE)
p2 <- hist(rnorm(500,6),plot=FALSE)
plot(0,0,type="n",xlim=c(0,10),ylim=c(0,100),xlab="x",ylab="freq",main="Two histograms")
plot(p1,col="green",density=10,angle=135,add=TRUE)
plot(p2,col="blue",density=10,angle=45,add=TRUE)

そして、これが結果です（RStudioのせいで少し広すぎます:-)）：

ここに画像の説明を入力

r - Rで2つのヒストグラムを一緒にプロットする方法は？

9 に答える 9

Related

Reference