r - タップ結果をRの元のデータフレームに復元する

Question

私は、さまざまな年のさまざまな国への企業の年間輸出を含むデータフレームを持っています。私の問題は、毎年、各国にいくつの企業があるかを示す変数を作成する必要があることです。次のような「tapply」コマンドでこれを完全に行うことができます

incumbents <- tapply(id, destination-year, function(x) length(unique(x)))

そしてそれはうまく動作します。私の問題は、incumbents には lengthがあり、その後の回帰で (もちろん、年と目的地に一致する方法で) 使用するために、length(destination-year)長さが必要ですlength(id)(毎年、各目的地にサービスを提供する多くの企業があります)。「for」ループでこれを行うことができますが、データベースが巨大であるため、非常に時間がかかります。

助言がありますか？

score 1 · Accepted Answer

tapplyを使用して、要約を元のデータフレームに「マージ」するだけmergeです。

サンプルデータを提供していないので、いくつか作成しました。それに応じて変更します。

n           = 1000
id          = sample(1:10, n, replace=T)
year        = sample(2000:2011, n, replace=T)
destination = sample(LETTERS[1:6], n, replace=T)

`destination-year` = paste(destination, year, sep='-')

dat = data.frame(id, year, destination, `destination-year`)

次に、要約を表にします。データフレームに再フォーマットし、名前を元のデータと一致させる方法に注意してください。

incumbents = tapply(id, `destination-year`, function(x) length(unique(x)))
incumbents = data.frame(`destination-year`=names(incumbents), incumbents)

最後に、元のデータにマージします。

merge(dat, incumbents)

ちなみに、結合して 3 番目の変数にする代わりに、destinationあなたyearが行ったように、tapply両方の変数をリストとして直接処理できます。

incumbents = melt(tapply(id, list(destination=destination, year=year), function(x) length(unique(x))))

score 1 · Accepted Answer

再現可能な例を提供していないため、これをテストすることはできませんが、使用できるはずですave：

incumbents <- ave(id, destination-year, FUN=function(x) length(unique(x)))

score 0 · Accepted Answer

@JohnColbyの優れたサンプルデータを使用して、私はこれに沿ってもっと何かを考えていました:

#I prefer not to deal with the pesky '-' in a variable name
destinationYear = paste(destination, year, sep='-')

dat = data.frame(id, year, destination, destinationYear)

#require(plyr)
dat <- ddply(dat,.(destinationYear),transform,newCol = length(unique(id)))

#Or if more speed is required, use data.table
require(data.table)
datTable <- data.table(dat)

datTable <- datTable[,transform(.SD,newCol = length(unique(id))),by = destinationYear]

r - タップ結果をRの元のデータフレームに復元する

3 に答える 3

Related

Reference