6

I have a data.table shown below. I'm trying to calculate the weighted mean for subsets of the data. I've tried two approaches with the MWE below

    set.seed(12345)
    dt = data.table(a =c(10,20,25,10,10),b=rnorm(5),c=rnorm(5),d=rnorm(5),e=rnorm(5))
    dt$key = sample(toupper(letters[1:3]),5,replace=T)
    setkey(dt, key)

First subsetting the .SD and using an lapply call, which doesnt work (and wasn't really expected to)

dt[,lapply(.SD,function(x) weighted.mean(x,.SD[1])),by=key]

Second trying to define a function to apply to the .SD as I would if I were using ddply.

This fails too.

wmn=function(x){
  tmp = NULL
  for(i in 2:ncol(x)){
    tmp1 = weighted.mean(x[,i],x[,1])
    tmp = c(tmp,tmp1)
  }
  return(tmp)
}

dt[,wmn,by=key]

Any thoughts on how best to do this?

Thanks

EDIT

Change to error on wmn formula on columns selected.

SECOND EDIT

Weighted Mean formula reversed back and added set.seed

4

1 に答える 1

12

「a」を重みとして使用して「b」...「e」の加重平均を取りたい場合は、これでうまくいくと思います。

dt[,lapply(.SD,weighted.mean,w=a),by=key,.SDcols=letters[1:5]]
于 2013-05-20T04:21:43.547 に答える