8

これはかなり基本的な質問だと思うので、なぜこれに対する解決策が見つからないのかわかりません。では、助けを求める必要があります。各月の最大温度値を使用して、月ごとに大気質データセットを再配置したいと考えています。さらに、月ごとの最高気温に対応する日を見つけたいと考えています。これを行うための最も怠惰な (コード単位の) 方法は何ですか?

私は成功せずに次のことを試みました:

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"))

dcast(mm, month + day ~ variable, max)
aggregate(formula = temp ~ month + day, data = airquality, FUN = max)

私はこのようなものを求めています:

month day temp
5     7    89
...
4

4 に答える 4

8

There was quite a discussion a while back about whether being lazy is good or not. Anwyay, this is short and natural to write and read (and is fast for large data so you don't need to change or optimize it later) :

require(data.table)
DT=as.data.table(airquality)

DT[,.SD[which.max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     8    76     203  9.7   97  28
[5,]     9    73     183  2.8   93   3

.SD is the subset of the data for each group, and you just want the row from it with the largest Temp, iiuc. If you need the row number then that can be added.

Or to get all the rows where the max is tied :

DT[,.SD[Temp==max(Temp)],by=Month]

     Month Ozone Solar.R Wind Temp Day
[1,]     5    45     252 14.9   81  29
[2,]     6    NA     259 10.9   93  11
[3,]     7    97     267  6.3   92   8
[4,]     7    97     272  5.7   92   9
[5,]     8    76     203  9.7   97  28
[6,]     9    73     183  2.8   93   3
[7,]     9    91     189  4.6   93   4
于 2012-05-22T15:56:10.593 に答える
4

plyr を使用した別のアプローチ

require(reshape2)
names(airquality) <- tolower(names(airquality))
mm <- melt(airquality, id.vars = c("month", "day"), meas = c("temp"), value.name = 'temp')

library(plyr)

ddply(mm, .(month), subset, subset = temp == max(temp), select = -variable)

与える

  month day temp
1     5  29   81
2     6  11   93
3     7   8   92
4     7   9   92
5     8  28   97
6     9   3   93
7     9   4   93

または、さらに単純な

require(reshape2)
require(plyr)
names(airquality) <- tolower(names(airquality))
ddply(airquality, .(month), subset, 
  subset = temp == max(temp), select = c(month, day, temp) )
于 2012-05-23T02:03:04.467 に答える
2

でどうplyrですか?

max.func <- function(df) {
   max.temp <- max(df$temp)

   return(data.frame(day = df$Day[df$Temp==max.temp],
                     temp = max.temp))
}

ddply(airquality, .(Month), max.func)

ご覧のとおり、月の最高気温は複数の日で発生します。別の動作が必要な場合は、関数を簡単に調整できます。

于 2012-05-22T15:44:13.507 に答える
2

Or if you want to use the data.table package (for instance, if speed is an issue and the data set is large or if you prefer the syntax):

library(data.table)
DT <- data.table(airquality)
DT[, list(maxTemp=max(Temp), dayMaxTemp=.SD[max(Temp)==Temp, Day]), by="Month"]

If you want to know what the .SD stands for, have a look here: SO

于 2012-05-22T15:55:56.103 に答える