r - k-Rの戻り値を意味します

Question

Rでkmeans（）関数を使用していますが、返されたオブジェクトのtotss属性とtot.withinss属性の違いは何でしょうか。ドキュメントからは同じものを返しているようですが、私のデータセットに適用すると、totssの値は66213.63であり、tot.withinssの値は6893.50です。mroeの詳細に精通している場合はお知らせください。ありがとうございました！

マリウス。

score 20 · Accepted Answer

betweenss各クラスターの二乗和の間と内二乗和のベクトルを考えるとwithinss、式は次のとおりです。

totss = tot.withinss + betweenss
tot.withinss = sum(withinss)

たとえば、クラスタが 1 つしかない場合はbetweenss、と0にコンポーネントが 1 つだけ存在します。withinsstotss = tot.withinss = withinss

さらに明確にするために、クラスターの割り当てを考慮して、これらのさまざまな量を自分で計算できます。これは、それらの意味を明確にするのに役立ちます。の例のデータxとクラスタの割り当てを検討してください。二乗和関数を次のように定義します。これは、x の各列の平均をその列から引き、残りの行列の各要素の二乗和を計算します。cl$clusterhelp(kmeans)

# or ss <- function(x) sum(apply(x, 2, function(x) x - mean(x))^2)
ss <- function(x) sum(scale(x, scale = FALSE)^2)

次に、次のようになります。は当てはめられた値であることに注意してくださいcl$centers[cl$cluster, ]。つまり、i 番目の行が i 番目の点が属するクラスターの中心になるように、点ごとに 1 行の行列です。

example(kmeans) # create x and cl

betweenss <- ss(cl$centers[cl$cluster,]) # or ss(fitted(cl))

withinss <- sapply(split(as.data.frame(x), cl$cluster), ss)
tot.withinss <- sum(withinss) # or  resid <- x - fitted(cl); ss(resid)

totss <- ss(x) # or tot.withinss + betweenss

cat("totss:", totss, "tot.withinss:", tot.withinss, 
  "betweenss:", betweenss, "\n")

# compare above to:

str(cl)

編集：

kmeansこの質問に答えて以来、R は同様の例 ( example(kmeans)) と新しいメソッドを追加fitted.kmeansしました。コード行の末尾のコメントで、適合したメソッドが上記にどのように適合するかを示します。

score 0 · Accepted Answer

ドキュメントのエラーを見つけたと思います...次のように書かれています：

withinss     The within-cluster sum of squares for each cluster.
totss        The total within-cluster sum of squares.
tot.withinss     Total within-cluster sum of squares, i.e., sum(withinss).

ヘルプページの例でサンプルデータセットを使用する場合:

> kmeans(x,2)$tot.withinss
[1] 15.49669
> kmeans(x,2)$totss
[1] 65.92628
> kmeans(x,2)$withinss
[1] 7.450607 8.046079

誰かが r-devel メーリングリストにリクエストを書いて、ヘルプページの改訂を依頼するべきだと思います。あなたが望まないのなら、喜んでそうします。

r - k-Rの戻り値を意味します

2 に答える 2

Related

Reference