r - R のクラスターに従って序数データとバイナリデータを集計する

Question

CRAN clusterRのパッケージを使用してk-medoidクラスタリング分析を実行しました。データはdata.frame、13111 obsのdf4と呼ばれるものにあります。11 個のバイナリ値と順序値。クラスタリング後、クラスタの結果を元のクラスタに適用し、data.frame対応するクラスタ番号をユーザー ID に示しました。

クラスターに従って二項選択と順序選択を集計するにはどうすればよいですか?

たとえば、Gender変数には男性/女性の値があり、Age範囲は「18 ～ 20」、「21 ～ 24」、「25 ～ 34」、「35 ～ 44」、「45 ～ 54」、「55 ～ 64」、および「 65+”. の変数Genderとカテゴリのクラスターごとの男性と女性の値の合計が必要ですAge。

クラスターラベル列を含む私の data.frame の先頭は次のとおりです。

#12 variables because I added the clustering object to the data.frame
#I only included two variables from the R output
> str(df4)
'data.frame':   13111 obs. of  12 variables:
 $ Age                  : Factor w/ 7 levels "18-20","21-24",..: 6 6 6 6 7 6 5 7 6 3 ...
 $ Gender            : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 1 2 1 2 2 …

#I only included three variables from the R output
> head(df4)
     Age    Gender   
1   55-64 Female          
2   55-64 Female          
3   55-64   Male          
4   55-64   Male          
5     65+   Male          
6  55-64 Female

私のデータセットに似た再現可能な例を次に示します。

age <- c("18-20", "21-24", "25-34", "35-44", "45-54", "55-64", "65+")
gender <- c("Female", "Female", "Male", "Male", "Male", "Male", "Female")
smalldf <- data.frame(age, gender)
#Import cluster package
library(cluster)
#Create dissimilarity matrix
#Gower coefficient for finding distance between mixed variable
smalldaisy4 <- daisy(smalldf, metric = "gower", 
                     type = list(symm = c(2), ordratio = c(1))) 
#Set randomization seed
set.seed(1)
#Pam algorithm with 3 clusters 
smallk4answers <- pam(smalldaisy4, 3, diss = TRUE)
#Apply cluster IDs to original data frame
smalldf$cluster <- smallk4answers$cluster

出力の望ましい結果 (仮説):

  cluster female male 18-20 21-24 25-34 35-44 45-54 55-64 65+
1 1       1      1    1     2     1     0     3     1     0
2 2       2      1    1     1     0     1     2     0     0
3 3       0      1    1     1     1     1     0     2     3

もっと情報を提供できるかどうか教えてください。

score 2 · Accepted Answer

性別ごとのクラスター表と年齢ごとのクラスター表の 2 つのテーブルを 1 つのマトリックスに表示したいようです。

 with( smalldf, cbind(table(cluster, gender), table(cluster, age)  ) )
#----------------
  Female Male 18-20 21-24 25-34 35-44 45-54 55-64 65+
1      2    0     1     1     0     0     0     0   0
2      0    4     0     0     1     1     1     1   0
3      1    0     0     0     0     0     0     0   1

r - R のクラスターに従って序数データとバイナリ データを集計する

1 に答える 1

Related

Reference

r - R のクラスターに従って序数データとバイナリデータを集計する