r - R の加重データを使用したグループ別度数分布表

Question

重み付けされたデータを使用して、グループごとに 2 種類の度数分布表を計算したいと考えています。

次のコードを使用して、再現可能なデータを生成できます。

Data <- data.frame(
     country = sample(c("France", "USA", "UK"), 100, replace = TRUE),
     migrant = sample(c("Native", "Foreign-born"), 100, replace = TRUE),
     gender = sample (c("men", "women"), 100, replace = TRUE),
     wgt = sample(100),
     year = sample(2006:2007)
     )

まず、国と年ごとの移民ステータス (ネイティブ VS 外国生まれ) の頻度表を計算してみます。パッケージquestionrとを使用して次のコードを作成しましたplyr。

db2006 <- subset (Data, year == 2006)
db2007 <- subset (Data, year == 2007)

result2006 <- as.data.frame(cprop(wtd.table(db2006$migrant, db2006$country, weights=db2006$wgt),total=FALSE))
result2007 <- as.data.frame(cprop(wtd.table(db2007$migrant, db2007$country, weights=db2007$wgt),total=FALSE))

result2006<-rename (result2006, c(Freq = "y2006"))
result2007<-rename (result2007, c(Freq = "y2007"))

result <- merge(result2006, result2007, by = c("Var1","Var2"))

私の実際のデータベースでは 10 年なので、このコードをすべての年に適用するには時間がかかります。誰もそれを行うためのより速い方法を知っていますか?

また、国別、年別の移民ステータスに占める女性と男性の割合を計算したいと思います。私は次のようなものを探しています:

Var1            Var2     Var3     y2006   y2007
Foreign born    France   men        52     55
Foreign born    France   women      48     45
Native          France   men        51     52
Native          France   women      49     48
Foreign born    UK       men        60     65
Foreign born    UK       women      40     35
Native          UK       men        48     50
Native          UK       women      52     50

これらの結果を得る方法を知っている人はいますか?

score 1 · Accepted Answer

これは、次の方法で行うことができます: 既に記述したコードで関数を作成します。データ内のすべての年にわたってその関数を反復するために使用lapplyします。次に、とを使用Reduceしmergeて、結果のリストを 1 つのデータフレームに折りたたみます。このような：

# let's make your code into a function called 'tallyho'
tallyho <- function(yr, data) {

  require(dplyr)
  require(questionr)

  DF <- filter(data, year == yr)

  result <- with(DF, as.data.frame(cprop(wtd.table(migrant, country, weights = wgt), total = FALSE)))

  # rename the last column by year
  names(result)[length(names(result))] <- sprintf("y%s", year)

  return(result)

}

# now iterate that function over all years in your original data set, then 
# use Reduce and merge to collapse the resulting list into a data frame
NewData <- lapply(unique(Data$year), function(x) tallyho(x, Data)) %>%
  Reduce(function(...) merge(..., all=T), .)

r - R の加重データを使用したグループ別度数分布表

1 に答える 1

Related

Reference