r - tidymodels でトレーニングされた PLS モデルの予測子の重要性

Question

PLS モデルに適合させるために tidymodels を使用していますが、PLS 変数の重要度スコアまたは係数を見つけるのに苦労しています。

これは私がこれまでに試したことです。サンプルデータは AppliedPredictiveModeling パッケージからのものです。

モデリングフィッティング

data(ChemicalManufacturingProcess) 
split <- ChemicalManufacturingProcess %>% initial_split(prop = 0.7)
train <- training(split)
test <- testing(split)

tidy_rec <- recipe(Yield ~ ., data = train) %>% 
  step_knnimpute(all_predictors()) %>% 
  step_BoxCox(all_predictors()) %>% 
  step_normalize(all_predictors()) %>% 
  step_nzv(all_predictors()) %>% 
  step_corr(all_predictors())

boots <- bootstraps(time = 25, data = train)

tidy_model <- plsmod::pls(num_comp = tune()) %>% 
  set_mode("regression") %>% 
  set_engine("mixOmics")

tidy_grid <- expand.grid(num_comp = seq(from = 1, to = 48, by = 5))

tidy_tune <- tidy_model %>% tune_grid(
  preprocessor = tidy_rec,
  grid = tidy_grid,
  resamples = boots,
  metrics = metric_set(mae, rmse, rsq)
)

tidy_best <- tidy_tune %>% select_best("rsq")
Final_model <- tidy_model %>% finalize_model(tidy_best)

tidy_wf <- workflow() %>% 
  add_model(Final_model) %>% 
  add_recipe(tidy_rec) 

Fit_PLS <- tidy_wf %>% fit(data = train)

# check the most important predictors
tidy_info <- Fit_PLS %>% pull_workflow_fit()
loadings <- tidy_info$fit$loadings$X

PLS 変数の重要度

tidy_load <- loadings %>% as.data.frame() %>% rownames_to_column() %>% 
  select(rowname, comp1, comp2, comp3) %>% 
  pivot_longer(-rowname) %>% 
  rename(predictors = rowname)

tidy_load %>% mutate(Sing = if_else(value < 0, "neg", "pos")) %>% 
  mutate(absvalue = abs(value)) %>% group_by(predictors) %>% summarise(Importance = sum(absvalue)) %>% 
  mutate(predictors = fct_reorder(predictors, Importance)) %>% 
  slice_head(n = 15) %>% 
  ggplot(aes(Importance, predictors, fill = predictors)) + geom_col(show.legend = F)

ありがとう！vi()このモデルでは、vip パッケージの機能は使用できません。

r - tidymodels でトレーニングされた PLS モデルの予測子の重要性

モデリングフィッティング

PLS 変数の重要度

1 に答える 1

Related

Reference