python - 元のデータをクラスター化されたデータと相関させるにはどうすればよいですか

Question

距離行列は、次のようなペアワイズ要素の距離行列を表します。

    A B C D .....
A   n1 n2 n3
B n1    
C n2 n4
D n3 n5 ....... 
E.........

クラスタリングのように配列を入力します

 arry=  [ 0 n1, n2, n3..
   n1.......
   n2 n4
   n3 n5 ]


Y=sch.linkage(arry,'single')
cutoff=1e-6
T=sch.fcluster(Y, cutoff,'distance')
print T

Z=sch.dendrogram(Y, color_threshold=cutoff)

私のfcluster出力は、他の人の以前のポスターからの[4 10 12 1 5 13 2 11 1 7 8 3 14 6 10 16 9 15 1 7]のようなものです。scipyによるクラスタリング-距離行列によるクラスター、元のオブジェクトを取り戻す方法

出力T[i]は、クラスター内の要素の数のみを示していることを理解しています。元の要素A、B、C、D、E .....要素をクラスターの結果および樹状図とリンクするにはどうすればよいですか？そしてそれらを私の図に適切にラボします。

score 2 · Accepted Answer

「出力T[i]はクラスター内の要素の数のみを示していることを理解しています...」

T[j]j番目のデータポイントの「クラスター番号」です。つまり、fclusterクラスターへのデータポイントの割り当てを提供します。したがって、たとえば、5つのデータポイントがありfcluster、最初、2番目、最後をクラスター1に配置し、その他をクラスター2に配置すると、の戻り値はfclusterになりますarray([1, 1, 2, 2, 1])。

これは、そのデータを分解する方法を示すデモです。便宜上、との組み合わせの代わりに使用しfclusterdataました。と同じものを返します。linkagefclusterfclusterdatafcluster

import numpy as np

def cluster_indices(cluster_assignments):
    n = cluster_assignments.max()
    indices = []
    for cluster_number in range(1, n + 1):
        indices.append(np.where(cluster_assignments == cluster_number)[0])
    return indices

if __name__ == "__main__":
    from scipy.cluster.hierarchy import fclusterdata

    # Make some test data.
    data = np.random.rand(15,2)

    # Compute the clusters.
    cutoff = 1.0
    cluster_assignments = fclusterdata(data, cutoff)

    # Print the indices of the data points in each cluster.
    num_clusters = cluster_assignments.max()
    print "%d clusters" % num_clusters
    indices = cluster_indices(cluster_assignments)
    for k, ind in enumerate(indices):
        print "cluster", k + 1, "is", ind

典型的な出力：

4 clusters
cluster 1 is [ 0  1  6  8 10 13 14]
cluster 2 is [ 3  4  5  7 11 12]
cluster 3 is [9]
cluster 4 is [2]

python - 元のデータをクラスター化されたデータと相関させるにはどうすればよいですか

1 に答える 1

Related

Reference