python - sklearn (scikit-learn) ロジスティック回帰パッケージ -- 分類用のトレーニング済み係数を設定します。

Question

だから私は scikit-learn パッケージの webpate を読みました:

http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.linear_model.LogisticRegression.html

ロジスティック回帰を使用してデータを適合させ、LogisticRegression のインスタンスを取得したら、それを使用して新しいデータポイントを分類できます。ここまでは順調ですね。

LogisticRegression() インスタンスの係数を設定する方法はありますか? トレーニング済みの係数を取得した後、同じ API を使用して新しいデータポイントを分類したいからです。

それとも、他の誰かが、より優れた API を備えた別の python 機械学習パッケージを推奨していますか?

ありがとう

score 7 · Accepted Answer

係数は、ロジスティック回帰クラスをインスタンス化したときに作成した推定オブジェクトの属性であるため、通常の Python の方法でアクセスできます。

>>> import numpy as NP
>>> from sklearn import datasets
>>> from sklearn import datasets as DS
>>> digits = DS.load_digits()
>>> D = digits.data
>>> T = digits.target

>>> # instantiate an estimator instance (classifier) of the Logistic Reg class
>>> clf = LR()
>>> # train the classifier
>>> clf.fit( D[:-1], T[:-1] )
    LogisticRegression(C=1.0, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l2', tol=0.0001)

>>> # attributes are accessed in the normal python way
>>> dx = clf.__dict__
>>> dx.keys()
    ['loss', 'C', 'dual', 'fit_intercept', 'class_weight_label', 'label_', 
     'penalty', 'multi_class', 'raw_coef_', 'tol', 'class_weight', 
     'intercept_scaling']

これが係数を取得する方法ですが、これらを予測に使用するだけの場合は、推定器のpredictメソッドを使用するのがより直接的な方法です。

>>> # instantiate the L/R classifier, passing in norm used for penalty term 
>>> # and regularization strength
>>> clf = LR(C=.2, penalty='l1')
>>> clf
    LogisticRegression(C=0.2, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l1', tol=0.0001)

>>> # select some "training" instances from the original data
>>> # [of course the model should not have been trained on these instances]
>>> test = NP.random.randint(0, 151, 5)
>>> d = D[test,:]     # random selected data points w/o class labels
>>> t = T[test,:]     # the class labels that correspond to the points in d

>>> # generate model predictions for these 5 data points
>>> v = clf.predict(d)
>>> v
    array([0, 0, 2, 0, 2], dtype=int32)
>>> # how well did the model do?
>>> percent_correct = 100*NP.sum(t==v)/t.shape[0]
>>> percent_correct
    100

score 5 · Accepted Answer

実際、estimator.coef_andestimator.intercept_属性は、通常のpython属性ではなく、読み取り専用のpythonプロパティです。それらの値は、またはを呼び出すときにパラメーターのメモリコピーを回避するために、そのメモリレイアウトがロジスティック回帰estimator.raw_coef_の基になるC++実装の予想されるメモリレイアウトを直接マップする配列から取得されます。liblinearestimator.predictestimator.predict_proba

読み取り専用プロパティを持つことは制限であり、それらのプロパティを取り除く方法を見つける必要があることに同意しますが、この実装をリファクタリングする場合は、不要なメモリコピーを導入しないように注意する必要があります。ソースコードをざっと見てください。

この制限を忘れないように、トラッカーで問題を開きました。

@propertyそれまでの間、注釈付きのestimator.coef_メソッドを読んで、estimator.coef_とestimator.raw_coef_がどのように関連しているかを理解し、値をestimator.raw_coef_直接変更することができます。

python - sklearn (scikit-learn) ロジスティック回帰パッケージ -- 分類用のトレーニング済み係数を設定します。

2 に答える 2

Related

Reference