r - 計算済みの dist オブジェクトに pvclust R 関数を適用する

Question

R を使用して階層的クラスタリングを実行しています。最初のアプローチとして、hclust次の手順を使用して実行しました。

距離行列をインポートしました
関数を使用してオブジェクトas.distに変換しましたdist
私はその物体hclustの上を走るdist

Rコードは次のとおりです。

distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)
hclust(d, "ward")

この時点で、関数で同様のことをしたいと思いますpvclust。distただし、事前計算されたオブジェクトを渡すことができないため、できません。distRの関数によって提供される距離の中で利用できない距離を使用していることを考慮して、どのように進めることができますか?

score 3 · Accepted Answer

Vincent の提案をテストしました。次のことができます (私のデータセットは非類似度行列です)。

# Import you data
distm <- read.csv("distMatrix.csv")
d <- as.dist(distm)

# Compute the eigenvalues
x <- cmdscale(d,1,eig=T)

# Plot the eigenvalues and choose the correct number of dimensions (eigenvalues close to 0)
plot(x$eig, 
   type="h", lwd=5, las=1, 
   xlab="Number of dimensions", 
   ylab="Eigenvalues")

# Recover the coordinates that give the same distance matrix with the correct number of dimensions    
x <- cmdscale(d,nb_dimensions)

# As mentioned by Stéphane, pvclust() clusters columns
pvclust(t(x))

score 2 · Accepted Answer

データセットが大きすぎない場合は、同じ距離行列を使用して、次元n-1の空間にn個の点を埋め込むことができます。

# Sample distance matrix
n <- 100
k <- 1000
d <- dist( matrix( rnorm(k*n), nc=k ), method="manhattan" )

# Recover some coordinates that give the same distance matrix
x <- cmdscale(d, n-1)
stopifnot( sum(abs(dist(x) - d)) < 1e-6 )

# You can then indifferently use x or d
r1 <- hclust(d)
r2 <- hclust(dist(x)) # identical to r1
library(pvclust)
r3 <- pvclust(x)

データセットが大きい場合は、pvclust実装方法を確認する必要があります。

score 1 · Accepted Answer

距離行列しかないのか、事前に計算したのかはわかりません。pvclust前者の場合、@Vincent によって既に提案されているように、それ自体の R コードを微調整することはそれほど難しくありません(fix()または何でも使用します。 CrossValidated に関する別の質問についていくつかのヒントを提供しました)。後者の場合、pvclustの作成者はカスタム距離関数の使用方法の例を提供していますが、これは「非公式バージョン」をインストールする必要があることを意味します。

r - 計算済みの dist オブジェクトに pvclust R 関数を適用する

3 に答える 3

Related

Reference