r - Add difference and percent change of every column in a data frame?

Question

I would like to be able to add the difference and percent change to every column in a dataframe.

I'm able to get as far as melting the data and performing the calculations, but I can't figure out how to cast or reshape it back together. I also have a sneaking suspicion that this is easily accomplished with plyr, but the n-1 rows returned by diff() gives me problems.

Using an included dataset:

library(plyr)
library(quantmod)
head(longley)

     GNP.deflator     GNP Unemployed Armed.Forces Population Year Employed
1947         83.0 234.289      235.6        159.0    107.608 1947   60.323
1948         88.5 259.426      232.5        145.6    108.632 1948   61.122
1949         88.2 258.054      368.2        161.6    109.773 1949   60.171
1950         89.5 284.599      335.1        165.0    110.929 1950   61.187
1951         96.2 328.975      209.9        309.9    112.075 1951   63.221
1952         98.1 346.999      193.2        359.4    113.270 1952   63.639

longley.m <- melt(longley, id="Year")
longley.m <- ddply(longley.m, .(variable), transform, valdiff=diff(c(NA, value)), valdelt=Delt(value))

head(longley.m)

  Year     variable value valdiff Delt.1.arithmetic
1 1947 GNP.deflator  83.0      NA                NA
2 1948 GNP.deflator  88.5     5.5       0.066265060
3 1949 GNP.deflator  88.2    -0.3      -0.003389831
4 1950 GNP.deflator  89.5     1.3       0.014739229
5 1951 GNP.deflator  96.2     6.7       0.074860335
6 1952 GNP.deflator  98.1     1.9       0.019750520

(I don't know why Delt makes it's own column name, but I've given up on that)

Now, I can cast(longley.m, Year ~ variable) to get back to the original dataset, but I want to be able to have the difference and percent change for each variable in a different column without performing the calculation manually on each variable and then rbinding it back together. I'm pretty confident I've tried every variation of cast to no avail...

Update: Joran solved the Delt column naming issue: coerce it with as.vector!

score 2 · Accepted Answer

使用時の奇妙な列名の理由はDelt、ベクトルではなく行列を返すためです。と強制することで、as.vectorその謎が解けます。

ただし、これを複雑にしすぎていると思います。データフレームを年ごとに単純に並べ替えてから、各列に適用し、列の名前を適切に変更してからdiff一緒にできない理由はありますか?Deltcbind

いくつかのスターターコード:

longley.o <- arrange(longley,Year)
apply(longley.o,2,function(x){c(NA,diff(x))})
apply(longley.o,2,Delt)

より完全なバージョン (列の手動入力なし):

longley.o <- arrange(longley,Year)
valdiff <- apply(longley.o,2,function(x){c(NA,diff(x))})
valdelt <- apply(longley.o,2,Delt)

colnames(valdiff) <- paste("valdiff",colnames(valdiff),sep = ".")
colnames(valdelt) <- paste("valdelt",colnames(valdelt),sep = ".")

out <- cbind(longley.o,
             valdiff[,-match("Year",colnames(longley.o))],
             valdelt[,-match("Year",colnames(longley.o))])

score 2 · Accepted Answer

@joranのようにこれにアプローチする可能性があります。

しかし、あなたがたどった道を続けたい場合はreshape()、ベース R を使用して旅を完了することができます。

# Your code
library(plyr)
library(quantmod)
library(reshape)
head(longley)
longley.m <- melt(longley, id="Year")

# My addition
longley.m <- ddply(longley.m, .(variable), transform, 
                   valdiff = diff(c(NA, value)), 
                   valdelt = as.vector(Delt(value)))
reshape(longley.m, idvar="Year", timevar="variable", direction="wide")

score 0 · Accepted Answer

インジケーターのカテゴリー内で溶解してから処理する戦略は、不必要に複雑だと思いました。行番号と一致するように、最初に NA の行を追加したデータフレームが必要な場合は、2 つの選択肢が 1 つのライナーとして提案されます。

as.data.frame( lapply(longley, function(x) c(NA, diff(x))))

または、すべてのエントリが数値であり (数値関数の使用によって示唆されているように)、したがって使用しても問題ないことがわかっている場合apply、このアプローチはさらに簡単です。

apply(longley,2, FUN=function(x) c(NA, diff(x)))

そして、これらすべてを Delt の結果と一緒にしたい場合:

cbind(apply(longley,2, FUN=function(x) c(NA, diff(x))), 
      apply(longley,2, Delt) )

r - Add difference and percent change of every column in a data frame?

3 に答える 3

Related

Reference