r - Rの要素としての組み合わせのための櫛のようなタスク

Question

良いタイトルを選択したかどうかはまったくわかりません...また、正しい用語を使用しているかどうかもわかりません。そのため、正しい検索用語を使用して、この問題の解決策を見つけることができるかもしれません...

3つの「排他的な」組み合わせのすべてのセットを取得したい文字列のリストがあります.

例: 以下の場合

require(utils)
mylist<-c("strA","strB","strC","strD","strE","strF")
t(combn(mylist,3))

これらの 6 つの文字列のうち 3 つの可能な組み合わせをすべてリストした表を取得します (したがって、各行は 3 つの組み合わせの 1 つを表します)。

        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strA" "strB" "strD"
   [3,] "strA" "strB" "strE"
   [4,] "strA" "strB" "strF"
   [5,] "strA" "strC" "strD"
   [6,] "strA" "strC" "strE"
   [7,] "strA" "strC" "strF"
   [8,] "strA" "strD" "strE"
   [9,] "strA" "strD" "strF"
  [10,] "strA" "strE" "strF"
  [11,] "strB" "strC" "strD"
  [12,] "strB" "strC" "strE"
  [13,] "strB" "strC" "strF"
  [14,] "strB" "strD" "strE"
  [15,] "strB" "strD" "strF"
  [16,] "strB" "strE" "strF"
  [17,] "strC" "strD" "strE"
  [18,] "strC" "strD" "strF"
  [19,] "strC" "strE" "strF"
  [20,] "strD" "strE" "strF"

しかし、各文字列が 1 回だけ出現する 3 の組み合わせのすべてのセットが必要です。したがって、私の目的の出力は次のようになります。

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "srtE" "strF"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strD"
   [1,] "strC" "strE" "strF"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strE"
   [1,] "strC" "strD" "strF"
...

したがって、ここでは、各サブ要素 ( $1、$2、$3など) に 3 つの文字列の 2 つの組み合わせが含まれています (2*3=6 として、6 つの文字列で)。各セットで、各文字列は複数回出現してはなりません。

mylistもちろん、これがの倍数ではないの長さに対しても可能であれば素晴らしいことですn=3。たとえば 10 個の文字列 (「strG」、「strH」、「strI」、「strJ」を追加) がある場合、各組み合わせで 1 つの文字列を省略したいと思います。したがって、望ましい結果は次のようになります

$1
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strI"
$2
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strH" "strJ"
$3
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strD" "strE" "strF"
   [3,] "strG" "strI" "strJ"
$4
        [,1]   [,2]   [,3]  
   [1,] "strA" "strB" "strC"
   [2,] "strE" "strF" "strG"
   [3,] "strH" "strI" "strJ"
...

誰かがこれに対する解決策を持っていますか? 私の説明が不明確な場合は、お知らせください。

乾杯

score 1 · Accepted Answer

転置されたコンボマトリックスの名前がであると仮定しmatます。intersect関数の結果に適用される長さとオーバーラップがあるかどうかを確認します。

 res <- list();
 for (i in 1:nrow(mat) ){
    for( j in 1:nrow(mat)){  
          if( !length(intersect(mat[i,] , mat[j,])) ) 
               res[[paste(i,j,sep="_")]] <- rbind( mat[i,], mat[j, ]) } }


> res
$`1_20`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"

$`2_19`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strD"
[2,] "strC" "strE" "strF"

$`3_18`
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strE"
[2,] "strC" "strD" "strF"

.... snipped

「一意」の定義によっては、最初の 10 項目のみを取得することを決定する場合があります。これは、これらの半分が行の転置であるためです。

> res[[1]]
     [,1]   [,2]   [,3]  
[1,] "strA" "strB" "strC"
[2,] "strD" "strE" "strF"
> res[[20]]
     [,1]   [,2]   [,3]  
[1,] "strD" "strE" "strF"
[2,] "strA" "strB" "strC"

score 1 · Accepted Answer

42 さんの助けに基づいて (ありがとうございました!) エレガントではありませんが、仕事を (ゆっくりと...) 行う方法を見つけました。しかし、次の手順を実行する前に可能な組み合わせのいくつかを排除できたという理由だけで、この方法が実現可能でした. 私の最初の問題では、49 個の文字列があり、非常に大きなベクトルになるため、次の手順を 15 個を超える文字列に適用する場合は注意してください。いくつの組み合わせを処理する必要があるかを計算する方法は確かにあります...

ここに完全な例があります

require(utils)
mylist<-paste("str",LETTERS[1:10],sep="")
mat<-as.data.frame(t(combn(mylist, 3, simplify = TRUE)))
mat[] <- lapply(mat, as.character)

mat.subset<-list()
for (i in seq(nrow(mat)))
{
  mat.temp<-mat
  j=1
  mat.subset[[i]]<-mat[i,]
  rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat[i,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
  mat.temp<-mat.temp[-rem.row,]
  while (j<=nrow(mat.temp))
  {
    if(!length(intersect(mat.temp[j,1:3],unlist(mat.subset[[i]]))))
    {
      mat.subset[[i]]<-rbind(mat.subset[[i]],mat.temp[j,])
      rem.row<-sort(unique(c(which(mat.temp[,1]%in%mat.temp[j,1:3]),which(mat.temp[,2]%in%mat[i,1:3]),which(mat.temp[,3]%in%mat[i,1:3]))))
      mat.temp<-mat.temp[-rem.row,]
    }
    j<-j+1
  }
}
mat.subset.lengths<-unlist(lapply(mat.subset,function(x) nrow(x)))
mat.subset<-mat.subset[which(mat.subset.lengths==max(mat.subset.lengths))]

上記のように、時間のかかるforループの前にいくつかの組み合わせを除外し、特定の数の開始点のみが完全なソリューション (または最悪の場合、完全なソリューションに近い) を生成するため、私の場合は最後の 2 つの手順が必要でした。 .

この手順でカバーされているよりも多くのセットがあるというヒントがある場合、またはより洗練された方法がある場合は、ご意見をお寄せください.

r - Rの要素としての組み合わせのための櫛のようなタスク

2 に答える 2

Related

Reference