r - 要因の組み合わせを R の存在/不在のワイドフォーマットテーブルに変換する

Question

まず、これは以前に回答されていると確信していますが、検索用語はヒットしにくいようです。重複がある場合は申し訳ありません。

因子のベクトルがあるとします。

all <- factor(letters)

そして、モデリングパイプラインの一部として、これらの因子レベルのすべての組み合わせを使用しました。

combos <- t(combn(as.character(all), 5))
head(combos)
#     [,1] [,2] [,3] [,4] [,5]
# [1,] "a"  "b"  "c"  "d"  "e" 
# [2,] "a"  "b"  "c"  "d"  "f" 
# [3,] "a"  "b"  "c"  "d"  "g" 
# ...

私の質問は、次のように、この 2 番目のマトリックスをすべてのレベルの有無を示すマトリックスに変換するにはどうすればよいかということです。

      a   b   c   d   e   f   g  ...
[1,]  1   1   1   1   1   0   0  ...
[2,]  1   1   1   1   0   1   0  ...
[3,]  1   1   1   1   0   0   1  ...
...

私が試したことに関しては、最初に考えたのはをifelse使用した行単位のアプリケーションでしapplyたが、実行可能なものをまとめることができませんでした。これを行う賢い方法はありますか？

score 3 · Accepted Answer

更新: さらに優れたソリューション

マトリックスインデックスを使用すると、速度をさらに向上させることができます。for ループを使用しない、大幅に改善されたソリューションを次に示します。

all <- factor(letters)
combos <- t(combn(as.character(all), 5))
A <- match(c(t(combos)), letters)
B <- 0:(length(A)-1) %/% 5 + 1
a <- unique(as.vector(combos))
x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
            dimnames = list(NULL, a))
x[cbind(B, A)] <- 1L

ベンチマーク

orig <- function() {
  a <- unique(as.vector(combos))
  x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
              dimnames = list(NULL, a))
  for (i in 1:nrow(combos)) {
    x[i, combos[i, ]] <- 1
  }
  x
}

new <- function() {
  A <- match(c(t(combos)), letters)
  B <- 0:(length(A)-1) %/% 5 + 1
  a <- unique(as.vector(combos))
  x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
              dimnames = list(NULL, a))
  x[cbind(B, A)] <- 1L
  x
}

identical(orig(), new())
# [1] TRUE

library(microbenchmark)
microbenchmark(orig(), new(), times = 20)
# Unit: milliseconds
#    expr       min        lq    median       uq      max neval
#  orig() 476.85206 486.11091 497.48429 512.4333 579.2695    20
#   new()  87.02026  91.17021  96.88463 111.6414 175.6339    20

元の答え

このような問題では、forループは問題なく機能し、簡単に事前に割り当てることができます。

a <- unique(as.vector(combos))
x <- matrix(0, ncol = length(a), nrow = nrow(combos), 
            dimnames = list(NULL, a))

for (i in 1:nrow(combos)) {
  x[i, combos[i, ]] <- 1
}

head(x)
#      a b c d e f g h i j k l m n o p q r s t u v w x y z
# [1,] 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [2,] 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [3,] 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [4,] 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [5,] 1 1 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
# [6,] 1 1 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

r - 要因の組み合わせを R の存在/不在のワイド フォーマット テーブルに変換する

3 に答える 3

更新: さらに優れたソリューション

ベンチマーク

元の答え

Related

Reference

r - 要因の組み合わせを R の存在/不在のワイドフォーマットテーブルに変換する