r - クラスター分析を使用して最も似ていない個体を選択する

Question

データを 5 つのクラスターにクラスター化したい場合、すべてのデータから最も異なる関係を持つ 50 人の個人を選択する必要があります。つまり、クラスター 1 に 100、2 に 200、3 に 400、4 に 200、5 に 100 の場合、最初のクラスターから 5 + 2 番目のクラスターから 10 + 3 番目から 20 + 4 番目から 10 + を選択する必要があります。 5から5。

データ例:

     mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1))

今までやっていたことは、データをクラスター化し、各クラスター内の個人をランク付けし、それを Excel にエクスポートして、そこから先に進むことでした... 私のデータが非常に大きくなったので、それが問題になりました。

R で前のものを適用する方法についての助けや提案をいただければ幸いです。

score 2 · Accepted Answer

それがまさにあなたが探しているものかどうかはわかりませんが、おそらく役立つでしょう：

mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1))
rownames(mydata) <- paste0("id", 1:100) # some id for identification


# cluster objects and calculate dissimilarity matrix
cl <- cutree(hclust(
  sim <- dist(mydata, diag = TRUE, upper=TRUE)), 5) 

# combine results, take sum to aggregate dissimilarity
res <- data.frame(id=rownames(mydata),
                  cluster=cl, dis_sim=rowSums(as.matrix(sim)))
# order, lowest overall dissimilarity will be first
res <- res[order(res$dis_sim), ] 


# split object
reslist <- split(res, f=res$cluster)


## takes first three items with highest overall dissim.
lapply(reslist, tail, n=3) 

## returns id´s with highest overall dissimilarity, top 20% 
lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2)

r - クラスター分析を使用して最も似ていない個体を選択する

2 に答える 2

Related

Reference