r - R: 並列バックエンドを使用する場合、caret は PSOCKcluster のマスターノードを使用しません

Question

並列バックエンドを使用して、ハイパーパラメーターのグリッドでモデルcaretをトレーニングしようとしています。xgboost

Give Me Some Creditcaretデータを使用して、のハイパーパラメータグリッド検索用の並列バックエンドを設定するコードを次に示します。

library(plyr)
library(dplyr)
library(pROC)
library(caret)
library(xgboost)
library(readr)
library(parallel)
library(doParallel)

if(exists("xgboost_cluster")) stopCluster(xgboost_cluster)
hosts = paste0("192.168.18.", 52:53)
xgboost_cluster = makePSOCKcluster(hosts, master="192.168.18.51")

# load the packages across the cluster
clusterEvalQ(xgboost_cluster, {
  deps = c("plyr", "Rcpp", "dplyr", "caret", "xgboost", "pROC", "foreach", "doParallel")
  for(d in deps) library(d, character.only = TRUE)
  rm(d, deps)
})

registerDoParallel(xgboost_cluster)  
# load in the training data
df_train = read_csv("04-GiveMeSomeCredit/Data/cs-training.csv") %>%
  na.omit() %>%                                                                # listwise deletion 
  select(-`[EMPTY]`) %>%
  mutate(SeriousDlqin2yrs = factor(SeriousDlqin2yrs,                           # factor variable for classification
                                   labels = c("Failure", "Success")))
# set up the cross-validated hyper-parameter search
xgb_grid_1 = expand.grid(
  nrounds = 1000,
  eta = c(0.01, 0.001, 0.0001),
  max_depth = c(2, 4, 6, 8, 10),
  gamma = 1
)

# pack the training control parameters
xgb_trcontrol_1 = trainControl(
  method = "cv",
  number = 5,
  verboseIter = TRUE,
  returnData = FALSE,
  returnResamp = "all",                                                        # save losses across all models
  classProbs = TRUE,                                                           # set to TRUE for AUC to be computed
  summaryFunction = twoClassSummary,
  allowParallel = TRUE
)

# train the model for each parameter combination in the grid, 
#   using CV to evaluate
xgb_train_1 = train(
  x = as.matrix(df_train %>%
                  select(-SeriousDlqin2yrs)),
  y = as.factor(df_train$SeriousDlqin2yrs),
  trControl = xgb_trcontrol_1,
  tuneGrid = xgb_grid_1,
  method = "xgbTree"
)

上のすべてのコアがhostsトレーニングに使用されていることを確認しましたが、masterノードではプロセスが使用されていません。これは予想される動作ですか？この動作を変更し、処理のためにマスターノードのコアを活用する方法はありますか?

score 1 · Accepted Answer

マスターノードを処理に利用するには、次のようにに追加'localhost'するだけです。hosts

hosts = c("localhost", paste0("192.168.18.", 52:53))

これにより、マスターノードの 1 つのコアがクラスターに追加され、処理に使用されます。複数のコアを追加する場合は、のインスタンスをさらに渡すだけです'localhost'。

hosts = c(rep('localhost', detectCores()), paste0("192.168.18.", 52:53))

r - R: 並列バックエンドを使用する場合、caret は PSOCKcluster のマスター ノードを使用しません

1 に答える 1

Related

Reference

r - R: 並列バックエンドを使用する場合、caret は PSOCKcluster のマスターノードを使用しません