r - Rのロジスティックモデルで参照カテゴリを強制する

Question

R を使用してロジスティックモデルを実行しており、次のように相互作用項を含める必要があります。ここで、A はカテゴリカル、B は連続的です。

Y ~ A + B + normalized(B):A

私の問題は、そうすると、参照カテゴリが

Y ~ A + B + A:B

これにより、モデルの比較が困難になります。参照カテゴリを常に同じにする方法があると確信していますが、簡単な答えが見つからないようです。

説明のために、私のデータは次のようになります。

income                      ndvi        sga
30,000$ - 49,999$        -0,141177617        0
30,000$ - 49,999$        -0,170513257        0
>80,000$                 -0,054939323        1
>80,000$                 -0,14724104         0
>80,000$                 -0,207678157        0
missing                  -0,229890869        1
50,000$ - 79,999$         0,245063253        0
50,000$ - 79,999$         0,127565529        0
15,000$ - 29,999$        -0,145778357        0
15,000$ - 29,999$        -0,170944338        0
30,000$ - 49,999$        -0,121060635        0
30,000$ - 49,999$        -0,245407291        0
missing                  -0,156427532        0
>80,000$                  0,033541238        0

そして、出力を以下に再現します。結果の最初のセットはモデル Y ~ A*B の形式で、2 番目の結果は Y ~ A + B + A:normalized(B) です。

                                   Estimate Std. Error z value Pr(>|z|)    
(Intercept)                            -2.72175    0.29806  -9.132   <2e-16 ***
ndvi                                    2.78106    2.16531   1.284   0.1990    
income15,000$ - 29,999$                -0.53539    0.46211  -1.159   0.2466    
income30,000$ - 49,999$                -0.68254    0.39479  -1.729   0.0838 .  
income50,000$ - 79,999$                -0.13429    0.33097  -0.406   0.6849    
income>80,000$                         -0.56692    0.35144  -1.613   0.1067    
incomemissing                          -0.85257    0.47230  -1.805   0.0711 .  
ndvi:income15,000$ - 29,999$           -2.27703    3.25433  -0.700   0.4841    
ndvi:income30,000$ - 49,999$           -3.76892    2.86099  -1.317   0.1877    
ndvi:income50,000$ - 79,999$           -0.07278    2.46483  -0.030   0.9764    
ndvi:income>80,000$                    -3.32489    2.62000  -1.269   0.2044    
ndvi:incomemissing                     -3.98098    3.35447  -1.187   0.2353 

                                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              -3.07421    0.30680 -10.020   <2e-16 ***
ndvi                                     -1.19992    2.56201  -0.468    0.640    
income15,000$ - 29,999$                  -0.33379    0.29920  -1.116    0.265    
income30,000$ - 49,999$                  -0.34885    0.26666  -1.308    0.191    
income50,000$ - 79,999$                  -0.12784    0.25124  -0.509    0.611    
income>80,000$                           -0.27255    0.27288  -0.999    0.318    
incomemissing                            -0.50010    0.31299  -1.598    0.110    
income<15,000$:normalize(ndvi)            0.40515    0.34139   1.187    0.235    
income15,000$ - 29,999$:normalize(ndvi)   0.17341    0.35933   0.483    0.629    
income30,000$ - 49,999$:normalize(ndvi)   0.02158    0.32280   0.067    0.947    
income50,000$ - 79,999$:normalize(ndvi)   0.39774    0.28697   1.386    0.166    
income>80,000$:normalize(ndvi)            0.06677    0.30087   0.222    0.824    
incomemissing:normalize(ndvi)                  NA         NA      NA       NA

つまり、最初のモデルでは、「収入 <15,000」というカテゴリが参照カテゴリですが、2 番目のモデルでは、まだはっきりしていない別のことが起こります。

score 0 · Accepted Answer

この方程式で回帰を実行したいとしましょう。

を使用して実装しようとしましたmodel.matrix。ただし、以下の結果に示されている自動化の問題がいくつかあります。それを実装するより良い方法はありますか？. より具体的には、X_1 が連続変数で、X_2 がダミーであるとしましょう。

基本的に、交互作用項の解釈は同じですが、主項 X_2 は X_1 がその平均値にあるときに評価されます。(この論文の初期草案を参照)

ここに私のポイントを説明するためのいくつかのデータがあります:( glm ではありませんが、同じ方法を glm に適用できます)

library(car)
str(Prestige)
# some data cleaning
Prestige <- Prestige[!is.na(Prestige$type),] 

# interaction the usual way.
lm1 <- lm(income ~ education+ type + education:type, data = Prestige); summary(lm1)

# interacting with demeaned education
Prestige$education_ <- Prestige$education-mean(Prestige$education)

正則法だと思い通りにならない。式は変数を参照として配置しないため

lm2 <- lm(income ~ education+ type + education_:type, data = Prestige); summary(lm2)

# Using model.matrix to shape the interaction
cusInt <- model.matrix(~-1+education_:type,data=Prestige)[,-1];colnames(cusInt)
lm3 <- lm(income ~ education+ type + cusInt, data = Prestige); summary(lm3)


compareCoefs(lm1,lm3,lm2)

結果は次のとおりです。

                         Est. 1  SE 1 Est. 2  SE 2 Est. 3  SE 3
(Intercept)                -1865  3682  -1865  3682   4280  8392
education                    866   436    866   436    297   770
typeprof                   -3068  7192   -542  1950   -542  1950
typewc                      3646  9274  -2498  1377  -2498  1377
education:typeprof           234   617                          
education:typewc            -569   885                          
cusInteducation_:typeprof                 234   617             
cusInteducation_:typewc                  -569   885             
typebc:education_                                      569   885
typeprof:education_                                    803   885
typewc:education_

したがって、基本的に model.matrix を使用する場合は、参照変数を設定するために介入する必要があります。また、変数名の前に custInt が表示されるため、比較するテーブルが多数ある場合に結果をフォーマットするのは非常に面倒です。

r - Rのロジスティックモデルで参照カテゴリを強制する

1 に答える 1

Related

Reference