1

出力する前に、クリーンアップ、名前の変更、および列の配置を行いたい計算からのデータフレームがよくあります。以下のすべてのバージョンが機能します。シンプルなdata.frameものが最も近いです。

最後に追加の冗長な [,....] なしで、のデータフレーム内計算と の列順序の保存を組み合わせる方法はありwithinますmutateか?data.frame()

library(plyr) 

# Given this chaotically named data.frame
d = expand.grid(VISIT=as.factor(1:2),Biochem=letters[1:2],time=1:5,
                subj=as.factor(1:3))
d$Value1 =round(rnorm(nrow(d)),2)
d$val2 = round(rnorm(nrow(d)),2)

# I would like to cleanup, compute and rearrange columns

# Simple and almost perfect
dDataframe = with(d, data.frame(
  biochem = Biochem,
  subj = subj,
  visit = VISIT,
  value1 = Value1*3 
))
# This simple solution is almost perfect, 
# but requires one more line
dDataframe$value2 = dDataframe$value1*d$val2

# For the following methods I have to reorder 
# and select in a second step

# use mutate from plyr to allow computation on computed values,
# which transform cannot do.
dMutate =   mutate(d,
  biochem = Biochem,
  subj = subj,
  visit = VISIT,
  value1 = Value1*3, #assume this is a time consuming function
  value2 = value1*val2
  # Could set fields = NULL here to remove,
  # but this does not help getting column order
)[,c("biochem","subj","visit","value1","value2")]

# use within. Same problem, order not preserved
dWithin = within(d, {
  biochem = Biochem
  subj = subj
  visit = VISIT
  value1 = Value1*3
  value2 = value1*val2       
})[,c("biochem","subj","visit","value1","value2")]


all.equal(dDataframe,dWithin)
all.equal(dDataframe,dMutate)
4

2 に答える 2

2

パッケージのsummarize(またはsummarise)を使用できます。plyrドキュメントから:

集計は、既存のデータ フレームに列を追加する代わりに、新しいデータ フレームを作成することを除いて、変換と同様の方法で機能します。[...]

あなたの例:

library(plyr)
summarize(d,
  biochem = Biochem,
  subj    = subj,
  visit   = VISIT,
  value1  = Value1 * 3,
  value2  = value1 * val2       
)
于 2013-08-04T16:12:41.340 に答える
2

に移行してもよければdata.table、参照によってこれらのアクションの (ほとんど) を実行し、[<-.data.frameおよびに関連するコピーを回避できます。$<-.data.frame

setnamesの名前を変更しdata.tableます。aをsetcolorder並べ替え、参照によって割り当てます。data.table:=

library(data.table)
DT <- data.table(d)
# rename to lowercase only
setnames(DT, old = names(DT), new = tolower(names(DT))
# reassign using `:=`
# note the use of `value1<-value1` to allow later use. 
# This will not be necessary once FR1492 has been implemented
# setting to NULL removes these columns
DT[, `:=`(value1 =value1<- value1*3, 
         value2  = value1 * val2, 
         val2 = NULL, time = NULL )]
setcolorder(DT, c("biochem","subj","visit","value1","value2"))

メモリ効率をあまり気にせずdata.table、構文にのみ使用したい場合は、

DT <- data.table(d)
DT[,list(  biochem = Biochem,   
    subj    = subj,
   visit   = VISIT,
   value1 = value1  <- Value1 * 3,
   value2  = value1 * val2       
   )]

動作します。

于 2013-08-05T00:18:43.937 に答える