1

Let's say I have a factor variable with numerous levels and I am trying to group them into several groups.

> levels(dat$years_continuously_insured_order2)
 [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"  
[19] "19"   "20" 

> levels(dat$age_of_oldest_driver)
 [1] "-16" "1"   "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[22] "34"  "35"  "36"  "37"  "38"  "39"  "40

I have a script which runs through these variables and groups them into several categories. However, the number of levels could (and usually is) different each time my script runs. Therefore, if my original code to group the variables was the following (see below), it wouldn't be of use if in an hour later, my script runs and the levels are different. Instead of 15 levels, I could now have 25 levels and the values are different, but I still need to group them into specific categories.

dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)

How can I find a more elegant way to group variables into segments? Are there better ways to do this in R?

Thanks!

4

2 に答える 2

2

継続的に保険をかけられた変数の因子レベルを数値に変換してから、カテゴリにカットして re-factor() にすることができます。最初のステップは R-FAQ で説明されています (正しく行うには、2 つのステップのプロセスです)。

 dat$years_cont <-  factor( cut(  as.numeric(as.character( 
                                     dat$years_continuously_insured_order2)),
                                 breaks=c(0,2,3, Inf), right=FALSE  ),
                           labels=c( "1 or less", "2", "3 +")
                           )
#-----------------
> str(dat)
'data.frame':   100 obs. of  2 variables:
 $ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
 $ years_cont                       : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...
于 2012-09-10T17:35:24.540 に答える
0

元の列が数値の場合は、要素ではなく数値として扱います。あなたがしていることをするためのはるかに簡単な方法は次のとおりです。

bin.value = function(x) {
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))
于 2012-09-10T17:09:52.710 に答える