1

I'd like to use R to generate two categorical variables (such as eye color and hair color, for instance) where I can specify the degree to which these two variables are associated. It doesn't really matter to me which levels of eye color would be associated with which levels of hair color, but just being able to specify an overall association, such as by specifying the odds ratio, is a requirement. Also, I know there are ways to do this for two normally distributed continuous variables using, for example, the mvtnorm package, so I could take that route and then choose cut points to make the variables categorical after the fact, but I don't want to do it that way if I can avoid it. Any help would be greatly appreciated!

Edit: apologies for not being clearer from the start, but what I'm really asking I suppose is whether or not there's a function anybody knows of in some R package that will do this in one or two lines.

4

1 に答える 1

3

オッズ比を指定できる場合 (およびベースライン オッズも指定する必要がある場合)、それらを確率に変換して を使用するだけrunif()です。

編集(質問を誤解しました)bindataパッケージを見てください。


必要に応じて、パッケージなしでそのようなデータを生成するために使用できる、私が書いた関数を次に示します。それはかなり不格好です。エレガントまたは高速ではなく、自明であることを目的としています。

odds.to.probs <- function(odds){
  probs <- odds / (odds+1)
  return(probs)
}

get.correlated.binary.data <- function(N, odds.x.eq.0, odds.y.eq.0.x.eq.0, 
                                       odds.ratio){
  odds.y.eq.0.x.eq.1 <- odds.y.eq.0.x.eq.0*odds.ratio
  prob.x.eq.0        <- odds.to.probs(odds.x.eq.0)
  prob.y.eq.0.x.eq.0 <- odds.to.probs(odds.y.eq.0.x.eq.0)
  prob.y.eq.0.x.eq.1 <- odds.to.probs(odds.y.eq.0.x.eq.1)

  x <- ifelse(runif(N)<=prob.x.eq.0, 0, 1)
  y <- rep(NA, N)
  y <- ifelse(x==0, ifelse(runif(sum(x))<=prob.y.eq.0.x.eq.0,       0, 1), y)
  y <- ifelse(x==1, ifelse(runif( (N-sum(x)) )<=prob.y.eq.0.x.eq.1, 0, 1), y)

  dat <- data.frame(x=x, y=y)
  return(dat)
}

> set.seed(9)
> dat <- get.correlated.binary.data(30, 3, 1.5, -.03)
> table(dat)
   y
x    0  1
  0 10 13
  1  0  7
于 2013-12-02T14:40:18.750 に答える