r - ダミー変数を生成する

Question

R で次のダミー変数を生成する際に問題が発生しました。

年次時系列データ (1948 年から 2009 年までの期間) を分析しています。2 つの質問があります。

観測番号 10、つまり 1957 年 (1957 年の値 = 1、それ以外の場合は 0) のダミー変数を生成するにはどうすればよいですか?
1957 年以前はゼロで、1957 年から 2009 年までは値 1 をとるダミー変数を生成するにはどうすればよいですか?

score 117 · Accepted Answer

多くの変数がある場合にうまく機能する別のオプションはfactor、とmodel.matrixです。

year.f = factor(year)
dummies = model.matrix(~year.f)

これには、切片列 (すべて 1) と、「デフォルト」または切片値となる 1 つを除いて、データセット内の各年に対して 1 つの列が含まれます。

contrasts.arginをいじることで、「デフォルト」の選択方法を変更できますmodel.matrix。

また、切片を省略したい場合は、最初の列を削除するか+0、数式の最後に追加するだけです。

これが役に立つことを願っています。

score 61 · Accepted Answer

これらのダミー変数を生成する最も簡単な方法は、次のようなものです。

> print(year)
[1] 1956 1957 1957 1958 1958 1959
> dummy <- as.numeric(year == 1957)
> print(dummy)
[1] 0 1 1 0 0 0
> dummy2 <- as.numeric(year >= 1957)
> print(dummy2)
[1] 0 1 1 1 1 1

より一般的にはifelse、条件に応じて 2 つの値から選択するために使用できます。したがって、0-1 のダミー変数の代わりに、何らかの理由で、たとえば 4 と 7 を使用したい場合は、を使用できますifelse(year == 1957, 4, 7)。

score 53 · Accepted Answer

dummies::dummy() の使用:

library(dummies)

# example data
df1 <- data.frame(id = 1:4, year = 1991:1994)

df1 <- cbind(df1, dummy(df1$year, sep = "_"))

df1
#   id year df1_1991 df1_1992 df1_1993 df1_1994
# 1  1 1991        1        0        0        0
# 2  2 1992        0        1        0        0
# 3  3 1993        0        0        1        0
# 4  4 1994        0        0        0        1

score 20 · Accepted Answer

この目的のためのパッケージmlrには以下が含まれます。createDummyFeatures

library(mlr)
df <- data.frame(var = sample(c("A", "B", "C"), 10, replace = TRUE))
df

#    var
# 1    B
# 2    A
# 3    C
# 4    B
# 5    C
# 6    A
# 7    C
# 8    A
# 9    B
# 10   C

createDummyFeatures(df, cols = "var")

#    var.A var.B var.C
# 1      0     1     0
# 2      1     0     0
# 3      0     0     1
# 4      0     1     0
# 5      0     0     1
# 6      1     0     0
# 7      0     0     1
# 8      1     0     0
# 9      0     1     0
# 10     0     0     1

createDummyFeatures元の変数を削除します。

https://www.rdocumentation.org/packages/mlr/versions/2.9/topics/createDummyFeatures
.....

score 11 · Accepted Answer

この種のダミー変数を扱うために私が通常行うことは次のとおりです。

(1) 観測番号 10、つまり 1957 年のダミー変数を生成するにはどうすればよいですか (1957 年の値 = 1、それ以外の場合は 0)。

data$factor_year_1 <- factor ( with ( data, ifelse ( ( year == 1957 ), 1 , 0 ) ) )

(2) 1957 年以前は 0 で、1957 年以降 2009 年までは値 1 をとるダミー変数を生成するにはどうすればよいですか?

data$factor_year_2 <- factor ( with ( data, ifelse ( ( year < 1957 ), 0 , 1 ) ) )

次に、この因子をモデルのダミー変数として導入できます。たとえば、変数に長期的な傾向があるかどうかを確認するには、次のようにしますy 。

summary ( lm ( y ~ t,  data = data ) )

お役に立てれば！

score 7 · Accepted Answer

私はこれをkaggleフォーラムで読みました：

#Generate example dataframe with character column
example <- as.data.frame(c("A", "A", "B", "F", "C", "G", "C", "D", "E", "F"))
names(example) <- "strcol"

#For every unique value in the string column, create a new 1/0 column
#This is what Factors do "under-the-hood" automatically when passed to function requiring numeric data
for(level in unique(example$strcol)){
  example[paste("dummy", level, sep = "_")] <- ifelse(example$strcol == level, 1, 0)
}

score 7 · Accepted Answer

K-1 の代わりに K 個のダミー変数を取得したい場合は、次を試してください。

dummies = table(1:length(year),as.factor(year))

一番、

score 5 · Accepted Answer

関数は、このifelseような単純なロジックに最適です。

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, 1, 0)
    ifelse(x <= 1957, 1, 0)

>  [1] 0 0 0 0 0 0 0 1 0 0 0
>  [1] 1 1 1 1 1 1 1 1 0 0 0

また、文字データを返したい場合は、そうすることができます。

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, "foo", "bar")
    ifelse(x <= 1957, "foo", "bar")

>  [1] "bar" "bar" "bar" "bar" "bar" "bar" "bar" "foo" "bar" "bar" "bar"
>  [1] "foo" "foo" "foo" "foo" "foo" "foo" "foo" "foo" "bar" "bar" "bar"

ネストされたカテゴリ変数...

> x <- seq(1950, 1960, 1)

    ifelse(x == 1957, "foo", ifelse(x == 1958, "bar","baz"))

>  [1] "baz" "baz" "baz" "baz" "baz" "baz" "baz" "foo" "bar" "baz" "baz"

これは最も簡単なオプションです。

score 5 · Accepted Answer

別の方法は、パッケージから使用することです。つまりmtabulate、qdapTools

df <- data.frame(var = sample(c("A", "B", "C"), 5, replace = TRUE))
  var
#1   C
#2   A
#3   C
#4   B
#5   B

library(qdapTools)
mtabulate(df$var)

与える、

score 1 · Accepted Answer

私はそのような関数を使用します（data.table用）：

# Ta funkcja dla obiektu data.table i zmiennej var.name typu factor tworzy dummy variables o nazwach "var.name: (level1)"
factorToDummy <- function(dtable, var.name){
  stopifnot(is.data.table(dtable))
  stopifnot(var.name %in% names(dtable))
  stopifnot(is.factor(dtable[, get(var.name)]))

  dtable[, paste0(var.name,": ",levels(get(var.name)))] -> new.names
  dtable[, (new.names) := transpose(lapply(get(var.name), FUN = function(x){x == levels(get(var.name))})) ]

  cat(paste("\nDodano zmienne dummy: ", paste0(new.names, collapse = ", ")))
}

使用法：

data <- data.table(data)
data[, x:= droplevels(x)]
factorToDummy(data, "x")

score 0 · Accepted Answer

こんにちは、この一般的な関数を作成して、Stata の置換関数を本質的に複製するダミー変数を生成しました。

x がデータフレームの場合は x であり、値を取得するときに値を取得するダミー変数aが1必要x$bですc

introducedummy<-function(x,a,b,c){
   g<-c(a,b,c)
  n<-nrow(x)
  newcol<-g[1]
  p<-colnames(x)
  p2<-c(p,newcol)
  new1<-numeric(n)
  state<-x[,g[2]]
  interest<-g[3]
  for(i in 1:n){
    if(state[i]==interest){
      new1[i]=1
    }
    else{
      new1[i]=0
    }
  }
    x$added<-new1
    colnames(x)<-p2
    x
  }

r - ダミー変数を生成する

17 に答える 17

Related

Reference