r - R を使用して相関ペアを見つける

Question

           VZ.Close CBOU.Close SBUX.Close   T.Close
VZ.Close   1.0000000  0.5804478  0.8872978 0.9480894
CBOU.Close 0.5804478  1.0000000  0.7876277 0.4988890
SBUX.Close 0.8872978  0.7876277  1.0000000 0.8143305
T.Close    0.9480894  0.4988890  0.8143305 1.0000000

では、株価の間にこれらの相関関係があるとしましょう。最初の行を見て、相関が最も高いペアを見つけたいと思います。それは VZ と T です。その後、これら 2 つの株式をオプションとして削除したいと思います。次に、残りの銘柄の中から相関が最も高いペアを見つけます。というように、すべての株がペアになるまで続きます。この例では、CBOU と SBUX が 2 つしか残っていないため、明らかに CBOU と SBUX になりますが、任意の数のペアに対応できるコードが必要です。

score 4 · Accepted Answer

各ステップで最大の相関を調べたい場合の解決策を次に示します。したがって、最初のステップでは、最初の行だけではなく、マトリックス全体を調べます。

サンプルデータ：

d <- matrix(runif(36),ncol=6,nrow=6)
rownames(d) <- colnames(d) <- LETTERS[1:6]
diag(d) <- 1
d
           A          B         C          D         E          F
A 1.00000000 0.65209204 0.8520392 0.26980214 0.5844000 0.69335143
B 0.73531603 1.00000000 0.5499431 0.60511580 0.7483990 0.14788134
C 0.56433218 0.27242769 1.0000000 0.07952776 0.2147628 0.03711562
D 0.91756919 0.04853523 0.5554490 1.00000000 0.4344089 0.23381447
E 0.06897889 0.80740821 0.7974340 0.87425643 1.0000000 0.74546072
F 0.19961474 0.61665231 0.2829632 0.58110694 0.7433924 1.00000000

そしてコード：

results <- data.frame(v1=character(0), v2=character(0), cor=numeric(0), stringsAsFactors=FALSE)
diag(d) <- 0
while (sum(d>0)>1) {
  maxval <- max(d)
  max <- which(d==maxval, arr.ind=TRUE)[1,]
  results <- rbind(results, data.frame(v1=rownames(d)[max[1]], v2=colnames(d)[max[2]], cor=maxval))
  d[max[1],] <- 0
  d[,max[1]] <- 0
  d[max[2],] <- 0
  d[,max[2]] <- 0
}

与える：

  v1 v2       cor
1  D  A 0.9175692
2  E  B 0.8074082
3  F  C 0.2829632

score 0 · Accepted Answer

これはあなたの質問に答えていると思いますが、元の質問があいまいであるため、確信が持てません...

# Construct toy example of symmentrical matrix
# nc is number of rows/columns in matrix, in the problem above it was 4, but let's try with 6
nc <- 6
mat <- diag( 1 , nc )
# Create toy correlation data for matrix
dat <- runif( ( (nc^2-nc)/2 ) )
# Fill both triangles of matrix so it is symmetric
mat[lower.tri( mat ) ] <- dat 
mat[upper.tri( mat ) ] <- dat

# Create vector of random string names for row/column names
names <- replicate( nc , expr = paste( sample( c( letters , LETTERS ) , 3 , replace = TRUE ) , collapse = "" ) )
dimnames(mat) <- list( names , names )

# Sanity check
mat
    SXK   llq   xFL   RVW   oYQ   Seb
SXK 1.000 0.973 0.499 0.585 0.813 0.751
llq 0.973 1.000 0.075 0.533 0.794 0.826
xFL 0.499 0.099 1.000 0.099 0.481 0.968
RVW 0.075 0.813 0.620 1.000 0.620 0.307
oYQ 0.585 0.794 0.751 0.968 1.000 0.682
Seb 0.533 0.481 0.826 0.307 0.682 1.000

# Ok - to problem at hand , you can just substitute your matrix into these lines:
# Clearly the diagonal in a correlation matrix will be 1 so this is excluded as per your problem
diag( mat ) <- NA
# Now find the next highest correlation in each row and set this to NA
mat <- t( apply( mat , 1 , function(x) { x[ which.max(x) ] <- NA ; return(x) } ) ) 

# Another sanity check...!
mat

      SXK   llq   xFL   RVW   oYQ   Seb
SXK    NA    NA 0.499 0.585 0.813 0.751
llq    NA    NA 0.075 0.533 0.794 0.826
xFL 0.499 0.099    NA 0.099 0.481    NA
RVW 0.075    NA 0.620    NA 0.620 0.307
oYQ 0.585 0.794 0.751    NA    NA 0.682
Seb 0.533 0.481    NA 0.307 0.682    NA


# Now return the two remaining columns with greatest correlation in that row
res <- t( apply( mat , 1 , function(x) { y <- names( sort(x , TRUE ) )[1:2] ; return( y ) } ) )

res


[,1]  [,2] 
SXK "oYQ" "Seb"
llq "Seb" "oYQ"
xFL "SXK" "oYQ"
RVW "xFL" "oYQ"
oYQ "llq" "xFL"
Seb "oYQ" "SXK"

これはあなたの質問に答えていますか？

r - R を使用して相関ペアを見つける

2 に答える 2

Related

Reference