r - Rstudio MCMC の Rstan の実行時間が長すぎる (利用可能な CPU と RAM の使用が制限されている)

Question

私は Rstan の世界の初心者ですが、論文には本当に必要です。私は実際にこのスクリプトと、同様の DS の推定時間を約 18 時間と報告している NYU の担当者からの同様のデータセットを使用しています。ただし、モデルを実行しようとすると、18 時間で 10% を超えることはありません。したがって、私が間違っていることと効率を改善する方法を理解するために、少し助けを求めます。

500 iter、100 のウォームアップ 2 チェーンモデルを 5 つのパラメーターで Bernoulli_logit 関数を使用して実行しています。No U Turn MC 手順でそのうちの 2 つを推定しようとしています。（各ステップで、ランダムな法線から各パラメーターを引き出し、yを推定して実際のデータと比較して、新しいパラメーターがデータにより適しているかどうかを確認します）

 y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - gamma * square( theta[jj[n]] - phi[kk[n]] ) );

(n は約 10mln です) 私のデータは、0 と 1 の 10.000x1004 マトリックスです。まとめると、これはツイッターで政治家をフォローしている人々に関するマトリックスであり、彼らが誰をフォローしているかに基づいて、彼らの政治的考えを推定したいと思います。16 GB RAM を搭載した Win8 Professional、6 ビット、I7 クアッドコアで R x64 3.1.1 を使用して RStudio でモデルを実行します。パフォーマンスを確認すると、rsession は 14% の CPU と 6GB の RAM しか使用していませんが、さらに 7 GB が空いています。10.000x250 マトリックスにサブサンプリングしようとしているときに、代わりに 1.5GB 未満を使用することに気付きました。ただし、50x50 のデータセットで手順を試してみたところ、問題なく動作したので、手順に間違いはありません。Rsession は 8 つのスレッドを開きます。各コアでアクティビティが見られますが、完全に占有されているものはありません。PC が最大限に機能しないのはなぜなのか、ボトルネック、上限、またはそれを妨げる設定があるのではないかと考えています。R は 64 ビット (チェックしたばかり) であるため、Rstan は (インストールに問題があり、いくつかのパラメーターが台無しになっている可能性がありますが) である必要があります。

コンパイルするとこうなる

Iteration: 1 / 1 [100%]  (Sampling)
#  Elapsed Time: 0 seconds (Warm-up)
#                11.451 seconds (Sampling)
#                11.451 seconds (Total)

SAMPLING FOR MODEL 'stan.code' NOW (CHAIN 2).

Iteration: 1 / 1 [100%]  (Sampling)
#  Elapsed Time: 0 seconds (Warm-up)
#                12.354 seconds (Sampling)
#                12.354 seconds (Total)

実行すると何時間も機能しますが、最初のチェーンの10％を超えることはありません（主に、PCが溶けそうになった後に中断したためです）。

Iteration:   1 / 500 [  0%]  (Warmup)

そして、この設定があります：

stan.model <- stan(model_code=stan.code, data = stan.data, init=inits, iter=1, warmup=0, chains=2)

## running modle
stan.fit <- stan(fit=stan.model, data = stan.data, iter=500, warmup=100, chains=2, thin=thin, init=inits)

手順を遅くしている原因を見つけるのを手伝ってください（そして、何も起こらない場合、より短い時間で妥当な結果を得るために何を操作できますか？）.

よろしくお願いします。

ML

これがモデルです (Pablo Barbera, NYU より)

n.iter <- 500
n.warmup <- 100
thin <- 2 ## this will give up to 200 effective samples for each chain and par

Adjmatrix <- read.csv("D:/TheMatrix/Adjmatrix_1004by10000_20150424.txt", header=FALSE)  
##10.000x1004 matrix of {0, 1} with the relationship "user i follows politician j"
StartPhi <- read.csv("D:/TheMatrix/StartPhi_20150424.txt", header=FALSE)  
##1004 vector of values [-1, 1] that should be a good prior for the Phi I want to estimate

start.phi<-ba<-c(do.call("cbind",StartPhi))
y<-Adjmatrix

J <- dim(y)[1]
K <- dim(y)[2]
N <- J * K
jj <- rep(1:J, times=K)
kk <- rep(1:K, each=J)

stan.data <- list(J=J, K=K, N=N, jj=jj, kk=kk, y=c(as.matrix(y)))

## rest of starting values
colK <- colSums(y)
rowJ <- rowSums(y)
normalize <- function(x){ (x-mean(x))/sd(x) }

inits <- rep(list(list(alpha=normalize(log(colK+0.0001)), 
                   beta=normalize(log(rowJ+0.0001)),
                   theta=rnorm(J), phi=start.phi,mu_beta=0, sigma_beta=1, 
                   gamma=abs(rnorm(1)), mu_phi=0, sigma_phi=1, sigma_alpha=1)),2)
##alpha and beta are the popularity of the politician j and the propensity to follow people of user i;
##phi and theta are the position on the political spectrum of pol j and user i; phi has a prior given by expert surveys
##gamma is just a weight on the importance of political closeness

library(rstan)

stan.code <- '
data {
int<lower=1> J; // number of twitter users
int<lower=1> K; // number of elite twitter accounts
int<lower=1> N; // N = J x K
int<lower=1,upper=J> jj[N]; // twitter user for observation n
int<lower=1,upper=K> kk[N]; // elite account for observation n
int<lower=0,upper=1> y[N]; // dummy if user i follows elite j
}
parameters {
vector[K] alpha;
vector[K] phi;
vector[J] theta;
vector[J] beta;
real mu_beta;
real<lower=0.1> sigma_beta;
real mu_phi;
real<lower=0.1> sigma_phi;
real<lower=0.1> sigma_alpha;
real gamma;
}
model {
alpha ~ normal(0, sigma_alpha);
beta ~ normal(mu_beta, sigma_beta);
phi ~ normal(mu_phi, sigma_phi);
theta ~ normal(0, 1); 
for (n in 1:N)
y[n] ~ bernoulli_logit( alpha[kk[n]] + beta[jj[n]] - 
gamma * square( theta[jj[n]] - phi[kk[n]] ) );
}
'

## compiling model
stan.model <- stan(model_code=stan.code, 
data = stan.data, init=inits, iter=1, warmup=0, chains=2)

## running modle
stan.fit <- stan(fit=stan.model, data = stan.data, 
iter=n.iter, warmup=n.warmup, chains=2, 
thin=thin, init=inits)

samples <- extract(stan.fit, pars=c("alpha", "phi", "gamma", "mu_beta",
                                "sigma_beta", "sigma_alpha"))

r - Rstudio MCMC の Rstan の実行時間が長すぎる (利用可能な CPU と RAM の使用が制限されている)

1 に答える 1

Related

Reference