r - Rでは、列属性の統計に基づいて行を選択する方法は?

Question

私のテーブルには、400 のクラスで分類された数千の行と、12 の列があります。

理想的な結果は、列「z」の最大値に基づいて、すべての元の列を含む 400 行 (各クラスに 1 行) のテーブルです。

これが私のデータの例です。R を使用して、この例で抽出された 2、4、7、8 行のみが必要です。

     x           y         z    cluster 
1  712521.75  3637426.49  19.46   12 
2  712520.69  3637426.47  19.66   12  *
3  712518.88  3637426.63  17.37   225
4  712518.4   3637426.48  19.42   225 *
5  712517.11  3637426.51  18.81   225
6  712515.7   3637426.58  17.8    17 
7  712514.68  3637426.55  18.16   17  *
8  712513.58  3637426.55  18.23   50  *
9  712512.1   3637426.62  17.24   50
10 712513.93  3637426.88  18.08   50

私はこれらを含む多くの異なる組み合わせを試しました：

  tapply(data$z, data$cluster, max)       # returns only the max value and cluster columns
  which.max(data$z)         # returns only the index of the max value in the entire table

plyr パッケージも読みましたが、解決策が見つかりませんでした。

score 2 · Accepted Answer

非常に簡単な方法は、aggregateandを使用することmergeです。

> merge(aggregate(z ~ cluster, mydf, max), mydf)
  cluster     z        x       y
1      12 19.66 712520.7 3637426
2      17 18.16 712514.7 3637427
3     225 19.42 712518.4 3637426
4      50 18.23 712513.6 3637427

tapplyコードの出力を使用して、必要なものを取得することもできます。data.frame名前付きのベクトルの代わりにそれを作るだけです。

> merge(mydf, data.frame(z = with(mydf, tapply(z, cluster, max))))
      z        x       y cluster
1 18.16 712514.7 3637427      17
2 18.23 712513.6 3637427      50
3 19.42 712518.4 3637426     225
4 19.66 712520.7 3637426      12

その他のオプションについては、この質問の回答を参照してください。

r - Rでは、列属性の統計に基づいて行を選択する方法は?

2 に答える 2

Related

Reference