r - R での Stata コマンドの再作成、地域の性別と年齢: keep if (Infected==1) | ((_n<=1*ncases) & (感染者==0))

Question

この Stata コマンドを R で再作成したい

 by Area Sex Age: keep if (Infected==1) | ((_n<=1*ncases) & (Infected==0))

これは、一致したケースコントロール研究用です

私のデータフレームには、193 のケースと、グループごとに可変数のコントロール (地域の性別と年齢) が含まれています。地域の性別と年齢のグループ化に基づいて、各ケースに 1 つのランダムコントロールを一致させようとしています。

ncasesは、各グループのケース数を示すデータフレーム内の整数です (Area Sex Age)

上記のコマンドラインは、Stata で正常に動作します。

しかし、私が書いた R コードは、最初のグループに対してのみ機能します。

dat5 <- subset(dat4,by=list(Area,Sex,Age),(Infected=1 | 
                                        ((seq(dim(dat4)[1]))<=1*ncases & Infected==0)))

これは私のデータフレーム dat4 です: Infected=1 はケース、infected=0 はコントロールです。

        Area  Sex Age  CensusNo   Animals Infected ncases 
18825   1     1   23   1023224    0       0        1 
18826   1     1   23   1024109    1       0        1 
18827   1     1   23   1024163    0       1        1 
41428   7     2   50   1047107    1       0        1 
41429   7     2   50   1047029    1       0        1 
41430   7     2   50   1046901    1       1        1 
41439   5     1   36   1047037    1       0        2 
41440   5     1   36   1047127    1       0        2 
41441   5     1   36   1047125    1       0        2 
41442   5     1   36   1047005    1       0        2 
41443   5     1   36   1046994    0       1        2 
41444   5     1   36   1046972    0       1        2

score 1 · Accepted Answer

data.table解決策。

library(data.table)
ddd <- data.table(dat4)
ddd[, {
  # coerce integer Infected to logical
  # not really necessary,  but for robustness
  ii <- as.logical(Infected)
  # if Infected == 1, then ii == TRUE
  if(ii){ # if TRUE, keep all cases
    .SD
    } else {
        # alternatively keep a sample of
        # the 
    .SD[sample.int(.N, size = ncases)]
    }
  } ,  by=list(Area,Sex,Age,ncases, Infected)]

score 1 · Accepted Answer

by関数にパラメーターはありませんsubset。ケースと各カテゴリ内のコントロールのサンプルに対して TRUE になるインデックスベクトルを作成します。

chosen <- by(dat4, INDICES= list(Area,Sex,Age), 
          FUN=function(d) {
                 idx <- d[['Infected']]==1 |
                 (d[['Infected']]==0 & sample( (1:NROW(d)) <= d[["ncases"]][1] ))
                 return( d[idx,]}
chosen <- do.call(rbind, chosen)

その最後の部分は、私には少しぎこちなく感じます。論理値のベクトルを作成し、それをsample関数で並べ替えています。このby関数は、各カテゴリのエントリを含むリストを返します。この場合rbind、「それらを積み重ねる」必要があります。サンプリングを行うためのより表現力豊かな方法があり、それらの神経経路はより多くのカフェインを必要としているだけだと思います. (また、その目的でサンプルデータセットを提供するまでテストされていません。)

r - R での Stata コマンドの再作成、地域の性別と年齢: keep if (Infected==1) | ((_n<=1*ncases) & (感染者==0))

2 に答える 2

Related

Reference