python - Sklearn digits データセット

Question

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn import svm

digits = datasets.load_digits()

print(digits.data)

classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.data[:-1], digits.target[:-1]

x = x.reshape(1,-1)
y = y.reshape(-1,1)
print((x))

classifier.fit(x, y)
###
print('Prediction:', classifier.predict(digits.data[-3]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

x と y の形状も変更しました。それでも次のようなエラーが表示されます:

サンプル数が一致しない入力変数が見つかりました: [1, 1796]

Y には 1796 要素の 1 次元配列がありますが、x には多くの要素があります。x の 1 はどのように表示されますか?

score 1 · Accepted Answer

実際に私が以下に提案したものを破棄します：

このリンクでは、一般的なデータセット API について説明しています。属性dataは各画像の 2 次元配列で、既に平坦化されています。

import sklearn.datasets
digits = sklearn.datasets.load_digits()
digits.data.shape
#: (1797, 64)

提供する必要があるのはこれだけです。再形成は必要ありません。同様に、属性dataは各ラベルの 1 次元配列です。

digits.data.shape
#: (1797,)

再成形は必要ありません。トレーニングとテストに分割して実行するだけです。

印刷x.shapeしてみてくださいy.shape。次のようなものが見つかると思います: (1, 1796, ...)and(1796, ...)それぞれ。fitscikit で分類子を呼び出す場合、2 つの同一形状の iterable が必要です。

手がかり、なぜさまざまな方法で再形成するときの引数は次のとおりです。

x = x.reshape(1, -1)
y = y.reshape(-1, 1)

多分試してください：

x = x.reshape(-1, 1)

あなたの質問とはまったく関係ありませんがdigits.data[-3]、トレーニングセットから除外された唯一の要素がいつになるかを予測していますdigits.data[-1]。それが意図的だったかどうかはわかりません。

とにかく、scikit metrics パッケージを使用して、より多くの結果について分類子を確認することをお勧めします。このページには、数字データセットで使用する例があります。

score 0 · Accepted Answer

再形成により、8x8 マトリックスが 1 次元ベクトルに変換され、フィーチャとして使用できます。予測に使用するものは同じ形式である必要があるため、トレーニングデータの X ベクトルだけでなく、X ベクトル全体を再形成する必要があります。

次のコードは、その方法を示しています。

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn import svm

digits = datasets.load_digits()


classifier = svm.SVC(gamma=0.4, C=100)
x, y = digits.images, digits.target

#only reshape X since its a 8x8 matrix and needs to be flattened
n_samples = len(digits.images)
x = x.reshape((n_samples, -1))
print("before reshape:" + str(digits.images[0]))
print("After reshape" + str(x[0]))


classifier.fit(x[:-2], y[:-2])
###
print('Prediction:', classifier.predict(x[-2]))
###
plt.imshow(digits.images[-2], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

###
print('Prediction:', classifier.predict(x[-1]))
###
plt.imshow(digits.images[-1], cmap=plt.cm.gray_r, interpolation='nearest')
plt.show()

次のように出力されます。

before reshape:[[  0.   0.   5.  13.   9.   1.   0.   0.]
 [  0.   0.  13.  15.  10.  15.   5.   0.]
 [  0.   3.  15.   2.   0.  11.   8.   0.]
 [  0.   4.  12.   0.   0.   8.   8.   0.]
 [  0.   5.   8.   0.   0.   9.   8.   0.]
 [  0.   4.  11.   0.   1.  12.   7.   0.]
 [  0.   2.  14.   5.  10.  12.   0.   0.]
 [  0.   0.   6.  13.  10.   0.   0.   0.]]
After reshape[  0.   0.   5.  13.   9.   1.   0.   0.   0.   0.  13.  15.  10.  15.   5.
   0.   0.   3.  15.   2.   0.  11.   8.   0.   0.   4.  12.   0.   0.   8.
   8.   0.   0.   5.   8.   0.   0.   9.   8.   0.   0.   4.  11.   0.   1.
  12.   7.   0.   0.   2.  14.   5.  10.  12.   0.   0.   0.   0.   6.  13.
  10.   0.   0.   0.]

そして、トレーニングに使用されなかった最後の 2 つの画像の正しい予測 - ただし、テストセットとトレーニングセットの間でより大きな分割を行うことを決定できます。

python - Sklearn digits データセット

2 に答える 2

Related

Reference