7

I have a matrix with 41 rows and 6 columns. This is how the first part looks like.

      X13  X15  X17  X19  X21  X23 
 [1,] "7"  "6"  "5"  "8"  "1"  "8" 
 [2,] "7"  "6"  "5"  "8"  "14" "3" 
 [3,] "7"  "6"  "1"  "3"  "12" "3" 
 [4,] "7"  "6"  "1"  "5"  "6"  "14"
 [5,] "2"  "6"  "1"  "5"  "16" "3" 
 [6,] "2"  "3"  "5"  "5"  "2"  "3" 
 [7,] "7"  "5"  "5"  "17" "7"  "3" 
 [8,] "7"  "2"  "5"  "2"  "2"  "14"
 [9,] "2"  "2"  "10" "10" "2"  "3" 
[10,] "2"  "2"  "10" "5"  "2"  "6" 

My goal is, to compare all the columns with each other, and see, how many of the numbers are equal in the 2 columns. I tried to do it like this:

s <- sum(matrix[,1]==matrix[,2])

But since I need to compare all the possible pairs, it is not effective. It would be good to put this in a loop, but I have no idea how.

And I would like to get my result in a form of a 6x6 similarity matrix. Something like this:

      X13 X15 X17 X19 X21 X23
 X13   0   0   3   2   2   3
 X15   0   0   9  11   4   6
 X17   3   9   0   5   1   3
 X19   2  11   5   0   9  10
 X21   2   4   1   9   0   9
 X23   3   6   3  10   9   0

As you see, I would like to put zeros to the matrix when a column is compared to iteslf.

Since I am a beginner R user, this task semms really complicated to me. I need to use this comparison to 50 matrixes, so I would be glad if you could help me. I would appreciate any tips/suggestions. My english is not so good either, but I hope I could explain my problem well enough. :)

4

3 に答える 3

2

v_outerqdap パッケージから使用するアプローチ:

library(qdapTools) #Using Simon's data

x <- v_outer(m, function(x, y) sum(x==y))
diag(x) <- 0

##    V1 V2 V3 V4 V5 V6
## V1  0  1  1  0  2  0
## V2  1  0  0  1  0  0
## V3  1  0  0  0  1  0
## V4  0  1  0  0  0  0
## V5  2  0  1  0  0  1
## V6  0  0  0  0  1  0

編集ベンチマークを追加しました:

set.seed(1)
matrix <- m <- matrix( sample(10,36,repl=TRUE) , ncol = 6 )

MATRIX <- function(){
    n <- seq_len( ncol(m) )
    id <- expand.grid( n , n )
    out <- matrix( colSums( m[ , id[,1] ] == m[ , id[,2] ] ) , ncol = length(n) )
    diag(out) <- 0
    out
}

V_OUTER <- function(){
    x <- v_outer(m, function(x, y) sum(x==y))
    diag(x) <- 0
    x
}

APPLY <- function(){
    similarity.matrix<-apply(matrix,2,function(x)colSums(x==matrix))
    diag(similarity.matrix)<-0
    similarity.matrix
}

library(microbenchmark)
(op <- microbenchmark( 
    MATRIX(),
    V_OUTER(),
    APPLY() ,  
times=1000L))

Unit: microseconds
      expr     min      lq  median      uq      max neval
  MATRIX() 243.980 264.972 277.101 286.898 1719.519  1000
 V_OUTER() 203.861 223.921 234.650 243.280 1579.570  1000
   APPLY()  96.566 108.228 112.893 118.025 1470.409  1000
于 2013-11-12T16:52:24.667 に答える