r - 別の列の最大値に基づいて値を選択します

Question

これはかなり基本的な質問だと思うので、なぜこれに対する解決策が見つからないのかわかりません。では、助けを求める必要があります。各月の最大温度値を使用して、月ごとに大気質データセットを再配置したいと考えています。さらに、月ごとの最高気温に対応する日を見つけたいと考えています。これを行うための最も怠惰な (コード単位の) 方法は何ですか?

私は成功せずに次のことを試みました：

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"))

dcast(mm, month + day ~ variable, max)
aggregate(formula = temp ~ month + day, data = airquality, FUN = max)

私はこのようなものを求めています:

month day temp
5     7    89
...

score 8 · Accepted Answer

There was quite a discussion a while back about whether being lazy is good or not. Anwyay, this is short and natural to write and read (and is fast for large data so you don't need to change or optimize it later) :

require(data.table)
DT=as.data.table(airquality)

DT[,.SD[which.max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     8    76     203  9.7   97  28
[5,]     9    73     183  2.8   93   3

.SD is the subset of the data for each group, and you just want the row from it with the largest Temp, iiuc. If you need the row number then that can be added.

Or to get all the rows where the max is tied :

DT[,.SD[Temp==max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     7    97     272  5.7   92   9
[5,]     8    76     203  9.7   97  28
[6,]     9    73     183  2.8   93   3
[7,]     9    91     189  4.6   93   4

score 4 · Accepted Answer

plyr を使用した別のアプローチ

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp')

library(plyr)

ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable)

与える

  month day temp
1     5  29   81
2     6  11   93
3     7   8   92
4     7   9   92
5     8  28   97
6     9   3   93
7     9   4   93

または、さらに単純な

require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, .(month), subset, 
  subset = temp == max(temp), select = c(month, day, temp) )

score 2 · Accepted Answer

でどうplyrですか？

max.func <- function(df) {
   max.temp <- max(df$temp)

   return(data.frame(day = df$Day[df$Temp==max.temp],
                     temp = max.temp))
}

ddply(airquality, .(Month), max.func)

ご覧のとおり、月の最高気温は複数の日で発生します。別の動作が必要な場合は、関数を簡単に調整できます。

score 2 · Accepted Answer

Or if you want to use the data.table package (for instance, if speed is an issue and the data set is large or if you prefer the syntax):

library(data.table)
DT <- data.table(airquality)
DT[, list(maxTemp=max(Temp), dayMaxTemp=.SD[max(Temp)==Temp, Day]), by="Month"]

If you want to know what the .SD stands for, have a look here: SO

r - 別の列の最大値に基づいて値を選択します

4 に答える 4

Related

Reference