r - Rで治療後の差を計算する

Question

Rのパネルデータについて質問があります。

私のデータは基本的に次のようになります。

Year  Name       Variable    Treatment
2000  CompanyA   10          0
2001  CompanyA   10          0
2002  CompanyA   10          1
2003  CompanyA   10          0
2004  CompanyA   12          0
2005  CompanyA   12          0
1999  CompanyB    5          1
2000  CompanyB    5          1
2001  CompanyB    5          0
2002  CompanyB    5          0
2003  CompanyB    6          0
2004  CompanyB    5          0
2005  CompanyB    6          0
2006  CompanyB    6          0

Rで（一定のタイムラグに関して）治療前後の従属変数の差を計算する機会はありますか？

残念ながら、アンバランスなパネルデータしかありません。計算の目的は、そこからダミー変数を作成することです。これは、従属変数が 2 年後に成長したかどうかを示します。次に、クロギット回帰を実行したいと思います。

編集

治療後に従属変数が変化したかどうかを知る必要があります。したがって、変数に関するすべての肯定的な変化に対してダミーを計算するコードが必要です。

出力は次のようになります。

Year  Name       Variable    Treatment   Dummy
2000  CompanyA   10          0           0
2001  CompanyA   10          0           0
2002  CompanyA   10          1           0
2003  CompanyA   10          0           0
2004  CompanyA   12          0           1
2005  CompanyA   12          0           1
1999  CompanyB    5          1           0
2000  CompanyB    5          1           0
2001  CompanyB    5          0           0
2002  CompanyB    5          0           0
2003  CompanyB    6          0           1
2004  CompanyB    5          0           0
2005  CompanyB    6          0           0
2006  CompanyB    6          0           0

したがって、その上で条件付きロジット回帰を実行し、特定のタイムラグの後に従属変数へのプラスの効果に治療 (他の変数を含む) をリンクできます。

score 2 · Accepted Answer

コメントの説明に従って回答を更新しました。単純な比較 (オン/オフ処理、パート A) を超えて、要求に応じて時間経過のアプローチを取り入れました (パート B)。
多くの点で、コードを正確な質問に適合させる必要があることに注意してください (治療が否定的になり、その後再び肯定的になる可能性があるトセをどうするか? 開始以降 (または停止後) の治療効果を予測するための危険な期間はどれくらいですか? o ftretment? これらの質問は R の問題というよりも概念的なものかもしれませんが、そのような質問を実装するための出発点をいくつか提供しようとしました。

#### sample data (added and changed some data to demonstarte sorting of the years ####
# and pos Treatment at first time point):

text <- "Year  Name       Variable    Treatment
2000  CompanyA   10          0
2001  CompanyA   10          0
2002  CompanyA   10          1
2003  CompanyA   10          0
2004  CompanyA   12          0
2010  CompanyA   15          1
2005  CompanyA   12          0
1999  CompanyB    5          0
2000  CompanyB    5          1
2001  CompanyB    5          0
2002  CompanyB    5          0
2003  CompanyB    6          0
2004  CompanyB    5          0
2005  CompanyB    6          0
2006  CompanyB    6          0
2001  CompanyC    5          1
2006  CompanyC    9          1"

df <- read.table(text=text, header=TRUE)
str(df)
head(df)

#### A) Simple way: just compare on/off treatment subject ####

mean(df[df$Treatment==1, "Variable"]) - mean(df[df$Treatment==0, "Variable"]) 


#### B) Compare within each company, take into consideration also the time course ####

# split to list according to company names, to analyse them separately
Name.u <- as.character(unique(df$Name))  # unique Company names
L <- sapply(Name.u, function(n) df[df$Name==n, ], simplify=FALSE)             
str(L)
L  # a list of dataframes, one dataframe for each company

## deal with special cases that may influence the concept of theanalysis
# sort for year (assuming there are nor ties)
L <- sapply(Name.u, function(n) L[[n]][order(L[[n]]$Year), ], simplify=FALSE) 
# posibly ignore those who were already treatet at study entry already
L.del <- sapply(Name.u, function(n) ifelse(L[[n]][1, "Treatment"]==1, TRUE, FALSE), simplify=TRUE) 
L[L.del] <- NULL
Name.u <- Name.u[!L.del]
str(L); L # note that CompanyC was deleted because of Treatment==1 at start

## display treatment duration etc.
LL <- function(L.n) {
  L.n$diff <- c(0, diff(L.n$Treatment))
  # stopifnot(sum(L.n$diff!=0) == 1)   # more than one status change - need clarification how this should be handled, see also lines below
  # ALL status change to "treated" (possibly more than one!)
  Rx.start <- which(L.n$diff==1) 
  # duration since FIRST documented treatment
  L.n$RxDurSinceFirst <- L.n$Year - min(L.n$Year[Rx.start])  
  L.n$RxDurReal <- L.n$RxDur
  # need to define what to do with those who are Treatment negative at THIS  time ...
  L.n$RxDurReal[L.n$Treatment==0] <- NA   
  # ... and those who became Treatment neg before or now
  L.n$RxDurReal[sapply(1:nrow(L.n), function(row.i) row.i >= min(which(L.n$diff==-1)))] <- NA  
  return(L.n)
}
str(LL)

# L2 is a new list of the same structure as L, but with more information 
# (more columns in each dataframe element)
L2 <- sapply(Name.u, function(n) LL(L[[n]]), simplify=FALSE)
str(L2)
L2

# for a company n one can then do (and of course further vectorize):
n <- Name.u[1]
str(L2[[n]])
L2[[n]]

# for a company n one can then compare RxDurSinceFirst, RxDurReal or 
# whateveryou want (and of course further vectorize):
(Var.before <- L2[[n]]$Variable[ L2[[n]]$RxDurSinceFirst <  0 ] )
(Var.after  <- L2[[n]]$Variable[ L2[[n]]$RxDurSinceFirst >= 0 ] )
t.test(Var.before, Var.after)  # works of course only if enough observations

# or on/off Treatment within one group, and use the means of each group 
# for further paired t.test/ U-test etc.
(Var.OnRx  <- L2[[n]]$Variable[ L2[[n]]$Treatment ==  0 ] )
(Var.OffRx <- L2[[n]]$Variable[ L2[[n]]$Treatment ==  1 ] )

### End ###

score 1 · Accepted Answer

1

または、

diff(by(df$Variable, df$Treatment, FUN=mean))
#[1] -1.242424

于 2014-07-25T19:52:22.120 に答える

score 0 · Accepted Answer

これがあなたを非常に近づけると思う答えです。私のコードは、処理前からの変数の変更を強調表示します。これは最も洗練されたコードではなく、多かれ少なかれドラフト版であることに注意してください。

まず、ここにテーブルの dput があります。これを実行してテーブルをロードするだけです。

dfx <- structure(list(Year = c(2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 
1999L, 2000L, 2001L, 2002L, 2003L, 2004L, 2005L, 2006L), Name = c("CompanyA", 
"CompanyA", "CompanyA", "CompanyA", "CompanyA", "CompanyA", "CompanyB", 
"CompanyB", "CompanyB", "CompanyB", "CompanyB", "CompanyB", "CompanyB", 
"CompanyB"), Variable = c(10L, 10L, 10L, 10L, 12L, 12L, 5L, 5L, 
5L, 5L, 6L, 5L, 6L, 6L), Treatment = c(0L, 0L, 1L, 0L, 0L, 0L, 
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L), Dummy = c(0L, 0L, 0L, 0L, 1L, 
1L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L)), .Names = c("Year", "Name", 
"Variable", "Treatment", "Dummy"), class = "data.frame", row.names = c(NA, 
-14L))

次に、特定の年 (行) が治療を受けたかどうかを示す補助変数 (has_treat) を作成しました。これは、この関数の最初の 2 行です。

次に、ケースが治療を受けたかどうか、変数が治療前の変数と異なるかどうかをテストする簡単な条件ステートメントが続きます。

foo <- function(dfx){
      dfx[(Position( isTRUE, diff(dfx$Treatment) == -1)+1)  : nrow(dfx), "has_treatment" ] <- 1 

      dfx[1:(Position( isTRUE, diff(dfx$Treatment) == -1))  , "has_treatment" ] <- 0 

      dfx[dfx$has_treatment == 1 & 
              ((dfx[dfx$Treatment == 1, "Variable"] == 
                  dfx[, "Variable"])==FALSE) ,"dummy"] <- 1
  return(dfx)
}

次に、これを ddply で実行します。ddply と plyr パッケージに慣れていない場合は、学習することを強くお勧めします。

library(plyr)

ddply(test, .variables = "Name", foo   )

繰り返しますが、これはまさにあなたが望むものではありませんが、原則として、正しい軌道に乗るはずです。もう一度やり直そうと思いますが、走らなければなりません。

また、コメントする人もいるかもしれませんが、これは最もエレガントな方法ではなく、より高速で効率的な方法がある可能性があります。

とにかく、少しでもお役に立てば幸いです。

r - Rで治療後の差を計算する

編集

3 に答える 3

Related

Reference