performance - R: 統計/計算効率

Question

私のコードのチャンクが行っているのは、4X100000 行列を F のいずれかの T で埋めることです。行列に X という名前を付けます。次に、Xij ~ Bernoulli(P) および P~normal(0.5,0.15) ここで、max(P) = 1 および最小(P) = 0。

統計は非常に非効率的です。上記のプロセスに準拠しているディストリビューションがあれば、私も助けてください。

毎回ランダムである 1 エントリごとに行列全体を埋める必要があるため、計算は非常に遅くなります。かかる時間を大幅に短縮する方法はありますか？非常に非効率的です。

ここで統計効率の問題

x = rnorm(100000,mean = 0.5,sd = 0.15)
x[x > 1] = 1
x[x < 0] = 0

probability = function(x){
  x.sam = sample(x,1)
  p = c(x.sam,1-x.sam)
  return(p)
}

aggro2 = function(x){
  aggro2 = sample(c(T,F),1, prob = probability(x))
  return(aggro2)
}

ここで計算効率の問題

ptm = proc.time()
aggro =c()
n=100000
for (i in 1:(4*n)){
  cat(round(i/(4*n)*100,2),"\n")
  aggro = c(aggro, aggro2(x))  
}
aggro.mat = matrix(aggro,4,n)

elapsed = proc.time()[3] - ptm[3]
cat(elapsed)

score 7 · Accepted Answer

How about this?

system.time({
    x <- rnorm(400000,mean = 0.5,sd = 0.15)  ## pick normal variables
    x2 <- pmin(1,pmax(0,x))                  ## bound at 0 and 1
    mids <- which(x2>0 & x2<1)
    x2[mids] <- rbinom(length(mids),prob=x2[mids],size=1)  
    res <- matrix(x2,ncol=4)
})

This doesn't seem to be exactly the same as what you're doing, but it seems (?) to match your description.

elapsed time: 0.443 seconds

Several of the things that you're doing will be unnecessarily slow:

using a for loop instead of vectorizing
creating a vector by appending instead of allocating the whole vector and then replacing elements
printing in the course of the for loop

performance - R: 統計/計算効率

1 に答える 1

Related

Reference