r - Grouping a variable with numerous levels

Question

Let's say I have a factor variable with numerous levels and I am trying to group them into several groups.

> levels(dat$years_continuously_insured_order2)
 [1] "1"    "2"    "3"    "4"    "5"    "6"    "7"    "8"    "9"    "10"   "11"   "12"   "13"   "14"   "15"   "16"   "17"   "18"  
[19] "19"   "20" 

> levels(dat$age_of_oldest_driver)
 [1] "-16" "1"   "15"  "16"  "17"  "18"  "19"  "20"  "21"  "22"  "23"  "24"  "25"  "26"  "27"  "28"  "29"  "30"  "31"  "32"  "33" 
[22] "34"  "35"  "36"  "37"  "38"  "39"  "40

I have a script which runs through these variables and groups them into several categories. However, the number of levels could (and usually is) different each time my script runs. Therefore, if my original code to group the variables was the following (see below), it wouldn't be of use if in an hour later, my script runs and the levels are different. Instead of 15 levels, I could now have 25 levels and the values are different, but I still need to group them into specific categories.

dat$years_continuously_insured2 <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[1]] <- NA
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[2:3]] <- "1 or less"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[4]] <- "2"
dat$years_continuously_insured2[dat$years_continuously_insured %in% levels(dat$years_continuously_insured)[5:7]] <- "3 +"
dat$years_continuously_insured2 <- factor(dat$years_continuously_insured2)

How can I find a more elegant way to group variables into segments? Are there better ways to do this in R?

Thanks!

score 2 · Accepted Answer

継続的に保険をかけられた変数の因子レベルを数値に変換してから、カテゴリにカットして re-factor() にすることができます。最初のステップは R-FAQ で説明されています (正しく行うには、2 つのステップのプロセスです)。

 dat$years_cont <-  factor( cut(  as.numeric(as.character( 
                                     dat$years_continuously_insured_order2)),
                                 breaks=c(0,2,3, Inf), right=FALSE  ),
                           labels=c( "1 or less", "2", "3 +")
                           )
#-----------------
> str(dat)
'data.frame':   100 obs. of  2 variables:
 $ years_continuously_insured_order2: Factor w/ 20 levels "1","10","11",..: 4 15 19 5 8 4 16 12 12 18 ...
 $ years_cont                       : Factor w/ 3 levels "1 or less","2",..: 3 3 3 3 3 3 3 2 2 3 ...

score 0 · Accepted Answer

元の列が数値の場合は、要素ではなく数値として扱います。あなたがしていることをするためのはるかに簡単な方法は次のとおりです。

bin.value = function(x) {
    ifelse(x <= 1, "1 or less", ifelse(x == 2, "2", "3+"))
}

dat$years_continuously_insured2 = as.factor(bin.value(as.integer(dat$years_continuously_insured)))

r - Grouping a variable with numerous levels

2 に答える 2

Related

Reference