python - CNN (Keras) の出力を LIME で説明する

Question

LIMEを使用して、Keras で畳み込みニューラルネットワークの出力を説明しようとしています。

私のニューラルネットワークは、すべてのクラスが独立しているマルチクラステキスト分類器です。したがって、テキストにはクラス 1 と 2、または 1 のみなどを含めることができます。テキストにクラスがない場合の 5 番目の「クラス」(なし)。

ただし、Keras と Lime を使用してバイナリ分類のケースを説明することはできましたが、独立したクラスを使用したマルチクラスのケースを取得することはできません。最初のヘルプはここで見つかりました:

ただし、私のコードは機能しません。Lime から次のような内部エラーが発生します。

from lime.lime_text import LimeTextExplainer, TextDomainMapper
explainer = LimeTextExplainer(class_names=encoder.classes_)


chosen_text = 2

def flatten_predict(i):
    global model   
    # catch single string inputs and convert them to list
    if i.__class__ != list:
        i = [i]
        print("## Caught and transformed single string.")
    # list for predictions
    predStorage = []
    # loop through input list and predict
    for textInput in i:
        textInput = preprocess(textInput)
        textInput = make_predictable(textInput)
        pred = model.predict(textInput)
        pred = np.append(pred, 1-pred, axis=1)
        # control output of function

        predStorage.extend(pred)
    return np.asarray(predStorage)


def get_predict_proba_fn_of_class(label):
    """assuming wrapped_predict outputs an (n, d) array of prediction probabilities, where d is the number of labels"""
    def rewrapped_predict(strings): 
        preds = flatten_predict(strings)[:, np.where(flatten_predict(strings)==label)].reshape(-1, 1)
        ret = np.asarray(np.hstack([(1 - preds), preds]))
        return ret

    return rewrapped_predict

str = 'Ein sehr freundlicher Arzt.'
preds = flatten_predict(str)
labels_to_explain = preds# 
print(labels_to_explain)

explanation_for_label = {}
for label in labels_to_explain:
    wrapped = get_predict_proba_fn_of_class(label)
    explanation_for_label[label] = explainer.explain_instance(str, wrapped)
    explanation_for_label[label].show_in_notebook()

エラーメッセージ：

ValueError                                Traceback (most recent call last)
<ipython-input-26-8df61aaa23f4> in <module>()
     53 for label in labels_to_explain:
     54     wrapped = get_predict_proba_fn_of_class(label)
---> 55     explanation_for_label[label] = explainer.explain_instance(str, wrapped)
     56     explanation_for_label[label].show_in_notebook()
     57 

/usr/local/lib/python3.6/dist-packages/lime/lime_text.py in explain_instance(self, text_instance, classifier_fn, labels, top_labels, num_features, num_samples, distance_metric, model_regressor)
    405                 data, yss, distances, label, num_features,
    406                 model_regressor=model_regressor,
--> 407                 feature_selection=self.feature_selection)
    408         return ret_exp
    409 

/usr/local/lib/python3.6/dist-packages/lime/lime_base.py in explain_instance_with_data(self, neighborhood_data, neighborhood_labels, distances, label, num_features, feature_selection, model_regressor)
    155                                                weights,
    156                                                num_features,
--> 157                                                feature_selection)
    158 
    159         if model_regressor is None:

/usr/local/lib/python3.6/dist-packages/lime/lime_base.py in feature_selection(self, data, labels, weights, num_features, method)
    104                 n_method = 'highest_weights'
    105             return self.feature_selection(data, labels, weights,
--> 106                                           num_features, n_method)
    107 
    108     def explain_instance_with_data(self,

/usr/local/lib/python3.6/dist-packages/lime/lime_base.py in feature_selection(self, data, labels, weights, num_features, method)
     78             clf = Ridge(alpha=0, fit_intercept=True,
     79                         random_state=self.random_state)
---> 80             clf.fit(data, labels, sample_weight=weights)
     81             feature_weights = sorted(zip(range(data.shape[0]),
     82                                          clf.coef_ * data[0]),

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/ridge.py in fit(self, X, y, sample_weight)
    678         self : returns an instance of self.
    679         """
--> 680         return super(Ridge, self).fit(X, y, sample_weight=sample_weight)
    681 
    682 

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/ridge.py in fit(self, X, y, sample_weight)
    489 
    490         X, y = check_X_y(X, y, ['csr', 'csc', 'coo'], dtype=_dtype,
--> 491                          multi_output=True, y_numeric=True)
    492 
    493         if ((sample_weight is not None) and

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    764         y = y.astype(np.float64)
    765 
--> 766     check_consistent_length(X, y)
    767 
    768     return X, y

/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_consistent_length(*arrays)
    233     if len(uniques) > 1:
    234         raise ValueError("Found input variables with inconsistent numbers of"
--> 235                          " samples: %r" % [int(l) for l in lengths])
    236 
    237 

ValueError: Found input variables with inconsistent numbers of samples: [5000, 100000]

私が間違っていることを誰かが知っていますか？入力フォーマットに関係していると確信しています。

python - CNN (Keras) の出力を LIME で説明する

2 に答える 2

Related

Reference