2

だから私はこの論文を読んでいて、xをyに回帰してもy〜xを回帰するのと同等の線が得られないことについて(長い話、それぞれのケースで異なる残差を最小化しています)、私のデータに違いをもたらしました。残念ながら、ggplot2 の stat_smooth で x を y に回帰させることはできません。

ここに私のデータからの100行のサンプルがあります:

> dput(df)
structure(list(Sample1 = c(4.98960685567417, 3.47204999768371, 
-1.02623042506767, -3.32192809488736, 4.92376518190495, 5.18057364383907, 
3.37344268231229, 7.17221940674516, -1.53797027495265, 4.59078735278587, 
-3.32192809488736, 4.13323387842297, -3.32192809488736, 1.55320685650568, 
3.45928530433997, 6.52941436696241, 0.291721646202104, 1.34692630168789, 
2.588263571484, 7.93984265473926, -1.02623042506767, -3.32192809488736, 
5.97393529545249, 4.96272548511184, -3.32192809488736, 3.19245860152084, 
-0.181177668537811, 1.1702151401323, 7.19741957077731, -3.32192809488736, 
3.09679962523049, 6.89612648849508, 4.49710160528899, 2.90303096540262, 
2.44022334122376, 4.53409217014818, 4.7590708357255, 4.72481633684295, 
0.691615956272056, 3.49724548154186, 3.31795986458613, 5.83755830779617, 
4.36304299927157, -0.0295430166578704, 5.02251917128349, 7.38120720286448, 
2.00357119131309, 3.28216732938001, 4.78231786359764, -0.764299371525642, 
3.92619319211129, 0.0406883788967327, 2.85511116340049, 5.66764040870341, 
1.5770354481397, 6.94329764296121, 3.9308239558015, 5.74724483501107, 
5.88247371232242, 3.60547757169518, -3.32192809488736, -0.542695575918393, 
5.28191680995643, 2.77496647475118, -1.34650507208309, -2.74783703944563, 
6.41591061993452, 3.7539739823111, 2.49127670210767, 6.65016370221803, 
6.32032425410637, -3.32192809488736, 0.402556327534611, 2.34628748804301, 
0.89488462084739, 1.1702151401323, -3.32192809488736, 4.72748036766425, 
5.61681844161155, 2.03831731682591, -0.443475473403791, -2.74783703944563, 
-3.32192809488736, 4.10482914538061, 5.3663866285164, 2.15373368651261, 
-3.32192809488736, 5.45745992658215, -0.764299371525642, 1.69060972560112, 
0.171658206055067, 4.46553470954628, 4.93993527938277, 2.23086351589855, 
0.2329385503416, 6.14110973873979, -0.443475473403791, -3.32192809488736, 
4.16502606126426, 5.45906373012262), Sample2 = c(4.87035389289972, 
3.05631305243025, 0.82323632916493, -2.4296633454796, 4.91811337313532, 
5.03007694504678, 3.33859900165761, 6.54820168349871, -2.4296633454796, 
4.86188407042978, -3.32192809488736, 4.62717002820498, -1.67106299528127, 
-3.32192809488736, 2.98790308889454, 6.05226752694174, 0.596977520519267, 
1.99826710332906, 2.23966464285132, 8.01717854320643, -0.00159952300352469, 
-1.48670630366969, 5.43594909921552, 4.95640818811881, -3.32192809488736, 
4.90168009665292, 0.555561013584398, 2.27836717662796, 6.47511546712932, 
-3.32192809488736, 3.01870866264235, 2.08814992134377, 3.65522335205242, 
3.36281234142216, 1.68917367193269, 3.90649842473784, 4.95242469001903, 
4.78780929337046, 3.21768687367553, 3.85621590296538, 3.41005295130021, 
5.53250156756501, 3.7175760684724, 0.676405696816667, 5.14971539956163, 
6.98398467911331, 3.3567969981478, 3.88157621603906, 4.90168009665292, 
-0.0647853872737311, 4.55018561922281, 1.21774415322831, 2.60706104004096, 
5.87421631057342, 2.22652958465421, 7.02865467720292, 3.39838702576746, 
5.19424993242169, 5.32786325515541, 2.67632233586628, -2.80788789937207, 
0.172877949731557, 4.84264318258294, 2.7144567398541, -0.00159952300352469, 
-1.17646253367489, 6.33756586506636, 3.68433673855881, 2.62719159116047, 
6.51076754571708, 6.31214443849056, -1.88247779017497, 0.714537926272974, 
3.17045554987033, 0.32851270527504, 9.05953630113146, -2.80788789937207, 
4.36480327392371, 5.88366056411048, 1.63054822649657, 2.01364185790907, 
-1.04323142745032, -2.80788789937207, 2.88298417549747, 5.60661313436485, 
1.9024101424183, -0.516096487844633, 5.42017137768702, -0.921270583844273, 
2.36482913745982, 2.46888455306976, 4.55807157102875, 5.01105219152827, 
1.19094552584428, 0.987970258058337, 5.55106017924769, 2.15899824583894, 
-3.32192809488736, 3.92287721859118, 5.06185416274169)), .Names = c("Sample1", 
"Sample2"), class = "data.frame", row.names = c(NA, -100L))

ggplot2 を使用してプロットしようとすると、プロットの x に回帰 y を簡単に配置できます。

> ggplot(df, aes(x=Sample1,y=Sample2)) + geom_point() + stat_smooth(formula = y~x, method = "lm")

しかし、数式で他のものを使用しようとすると、オブジェクトが見つからないというエラーが発生します。

> ggplot(df, aes(x=Sample1,y=Sample2)) + geom_point() + stat_smooth(formula = y~x, method = "lm") +  stat_smooth(formula = x~y, method = "lm")
Error in eval(expr, envir, enclos) : object 'y' not found

x と y の代わりに元の列名 (Sample1 と Sample2) を使用しstat_smooth、独自の aes を指定してみましたが、すべて同じエラーが発生しました。

これは名前空間のエラーだと思います。数式が何らかの形で間違った環境で評価されていますが、一方が機能するのに他方が機能しない理由、または問題の修正方法がわかりません。

どんなアイデアでも大歓迎です。

4

2 に答える 2

2

A work-around is to use another aes() call in the smoother call to switch around x and y:

ggplot(df, aes(x=Sample1,y=Sample2)) + 
    geom_point() + 
    stat_smooth(method="lm") + 
    stat_smooth(aes(x=Sample2,y=Sample1), method="lm")
于 2013-03-20T15:45:39.337 に答える
2

You need to think about what stat_smooth(formula = x~y, method = "lm") is doing in the context of the plot you are trying to draw.

The error is most likely arising because when formula is evaluated in the formula parsing code, the variable named on the right hand side of ~ (y in the error case) is not in the set of data defined internally by ggplot as corresponding to the x-axis of the plot. ggplot must isolate the x variable data and only look up variables from that data object.

Now back to thinking about your plot. You drew a plot with x on the x-axis and y on the y-axis. The linear model of x ~ y has the data vectors on the opposite "axes". So from the grammar point of view, the definition you gave doesn't make sense. I think what I am trying to say is that the right hand side of the formula can only contain x or functions of x, as that is all that is available internally.

If you want to do this, add the regression line for x ~ y yourself, something like:

mod <- lm(Sample1 ~ Sample2, data = df)
pdat <- with(df, data.frame(Sample2= seq(min(Sample2), max(Sample2), 
                                         length = 100)))
pdat <- transform(pdat, Sample1 = predict(mod, newdata = pdat))

ggplot(df, aes(x=Sample1,y=Sample2)) + 
  geom_point() + 
  stat_smooth(formula = y~x, method = "lm") +
  geom_line(data = pdat, aes(x = Sample1, y = Sample2))

You could add the confidence interval using geom_ribbon() and compute it by using the interval = "confidence" argument to predict.lm. You'll need to modify the

pdat <- transform(pdat, Sample1 = predict(mod, newdata = pdat))

to accommodate this, but that is left as an exercise for the reader.

于 2013-03-20T15:45:45.493 に答える