python - Hyperopt:保存されたモデルを sklearn にロードするときに、どの変数が最適なモデルに選択されたかを知るにはどうすればよいですか?

Question

sklearn 勾配ブースティング分類器をトレーニングし、Hyperopt で最適化しました。Hyperopt は、769 個の変数のうち 20 個の変数のみを選択します。ただし、ブラインドテストで sklearn の重みをロードしようとすると、どの変数が選択されたかが不明です。コードは次のとおりです。

from xgboost import XGBClassifier

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,f1_score,recall_score

# multi:mlogloss // binary:logistic

def accuracy(params):
    clf = XGBClassifier(**params,learning_rate=0.7,objective='binary:logistic', 
                    booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
    clf.fit(X_train,y_train) #eval_set=eval_set, 
    return clf.score(X_test, y_test)

eval_set=eval_set = [(X_test, y_test)]

parameters = {
    'n_estimators': hp.choice('n_estimators', range(20,40)),
    'max_depth': hp.choice('max_depth', range(4,100)),
    'gamma': hp.choice('gamma', range(0,10)),
    "min_child_weight":hp.choice("min_child_weight",range(0,1)),
    "num_features":hp.choice("num_features",range(10,X_train.shape[1])),
    "max_delta_step":hp.choice("max_delta_step",range(0,10))}


best = 0
def f(params):
    global best
    acc = accuracy(params)
    if acc > best:
        best = acc
    print ('Improving:', best, params)
    return {'loss': -acc, 'status': STATUS_OK}

trials = Trials()

best = fmin(f, parameters, algo=tpe.suggest, max_evals=80, trials=trials)
print ('best:',best)

clf = XGBClassifier(gamma=best['gamma'],max_delta_step=best['max_delta_step'],max_depth=best['max_depth'],
                learning_rate=0.1, n_estimators=best['n_estimators'], objective='binary:logistic', min_child_weight=best['min_child_weight'],
                num_features=best['num_features'],
                booster='gbtree', n_jobs=64,eval_metric="error",eval_set=eval_set, verbose=True)
clf.fit(X_train,y_train)
clf.score(X_test, y_test)

import joblib
filename = '/home/rubens.../modelos/Argumenta_Multi.sav'


joblib.dump(clf, filename)


loaded_model = joblib.load(filename)
result = loaded_model.predict(X_new)

hyperopt が選択した 20 個の変数を知るにはどうすればよいですか? hyperopt はカイ 2 乗を変数選択として使用していない可能性があるため、保存されたハイパーオプトの重みでカイ 2 乗 (最適な K = 20 を選択) を使用することを恐れています。

result=loaded_model...次のエラーが表示されます。

ValueError: X has 769 features, but DecisionTreeClassifier is expecting 20 features as input.

sklearnまた、Hyperopt が、Hyperopt の最適なモデルを保存する前に、の機能の重要性に従うかどうかもわかりません。

model.feature_importances_

python - Hyperopt:保存されたモデルを sklearn にロードするときに、どの変数が最適なモデルに選択されたかを知るにはどうすればよいですか?

0 に答える 0

Related

Reference