r - 'predict' は、'summary' からの係数を手動で使用する場合とは異なる結果をもたらします

Question

例を使って私の混乱を述べさせてください。

#making datasets
x1<-iris[,1]
x2<-iris[,2]
x3<-iris[,3]
x4<-iris[,4]
dat<-data.frame(x1,x2,x3)
dat2<-dat[1:120,]
dat3<-dat[121:150,]

#Using a linear model to fit x4 using x1, x2 and x3 where training set is first 120 obs.
model<-lm(x4[1:120]~x1[1:120]+x2[1:120]+x3[1:120])

#Usig the coefficients' value from summary(model), prediction is done for next 30 obs.
-.17947-.18538*x1[121:150]+.18243*x2[121:150]+.49998*x3[121:150]

#Same prediction is done using the function "predict"
predict(model,dat3)

私の混乱は次のとおりです。最後の 30 の値を予測した 2 つの結果は、多少異なるかもしれませんが、実際には異なります。どうしてそうなの？それらはまったく同じであるべきではありませんか？

score 4 · Accepted Answer

違いは非常に小さく、使用している係数の精度によるものだと思います (たとえば、切片の実際の値は-0.17947075338464965610...単純ではありません-.17947)。

実際、係数値を取得して式を適用すると、結果は予測に等しくなります。

intercept <- model$coefficients[1]
x1Coeff <- model$coefficients[2]
x2Coeff <- model$coefficients[3]
x3Coeff <- model$coefficients[4]

intercept + x1Coeff*x1[121:150] + x2Coeff*x2[121:150] + x3Coeff*x3[121:150]

score 2 · Accepted Answer

コードを少しきれいにすることができます。トレーニングデータセットとテストデータセットを作成するには、次のコードを使用できます。

# create training and test datasets
train.df <- iris[1:120, 1:4] 
test.df <- iris[-(1:120), 1:4]

# fit a linear model to predict Petal.Width using all predictors
fit <- lm(Petal.Width ~ ., data = train.df)
summary(fit)

# predict Petal.Width in test test using the linear model
predictions <- predict(fit, test.df)

# create a function mse() to calculate the Mean Squared Error
mse <- function(predictions, obs) {
  sum((obs - predictions) ^ 2) / length(predictions)
}

# measure the quality of fit
mse(predictions, test.df$Petal.Width)

予測が異なる理由は、関数predict()がすべての小数点を使用しているのに対し、「手動」計算では小数点を 5 つしか使用していないためです。このsummary()関数は係数の完全な値を表示しませんが、出力を読みやすくするために小数点以下 5 桁まで近似します。

r - 'predict' は、'summary' からの係数を手動で使用する場合とは異なる結果をもたらします

2 に答える 2

Related

Reference