python - train_test_split の 1 つのコマンドを使用して、データセットのマルチスプリットを作成します

Question

私のデータセットには42000行があります
training, cross-validation and testデータセットをの分割でセットに分割する必要があり60%, 20% and 20%ます。これはアンドリュー・ング教授の ml クラスの講義でのアドバイスによるものです。
scikit-learn には、これを行うためのメソッドtrain_test_splitがあることに気付きました。0.6, 0.2, 0.2しかし、1つのライナーコマンドのように分割を取得するように機能させることはできません

私がすることは

# split data into training, cv and test sets
from sklearn import cross_validation
train, intermediate_set = cross_validation.train_test_split(input_set, train_size=0.6, test_size=0.4)
cv, test = cross_validation.train_test_split(intermediate_set, train_size=0.5, test_size=0.5)


# preparing the training dataset
print 'training shape(Tuple of array dimensions) = ', train.shape
print 'training dimension(Number of array dimensions) = ', train.ndim
print 'cv shape(Tuple of array dimensions) = ', cv.shape
print 'cv dimension(Number of array dimensions) = ', cv.ndim
print 'test shape(Tuple of array dimensions) = ', test.shape
print 'test dimension(Number of array dimensions) = ', test.ndim

そして私に結果を得る

training shape(Tuple of array dimensions) =  (25200, 785)
training dimension(Number of array dimensions) =  2
cv shape(Tuple of array dimensions) =  (8400, 785)
cv dimension(Number of array dimensions) =  2
test shape(Tuple of array dimensions) =  (8400, 785)
test dimension(Number of array dimensions) =  2
features shape =  (25200, 784)
labels shape =  (25200,)

これを 1 つのコマンドで機能させるにはどうすればよいですか?

score 1 · Accepted Answer

train_test_splitとそのコンパニオンクラスShuffleSplitのソースコードを読み、ユースケースに適合させます。これは大きな機能ではなく、それほど複雑であってはなりません。

python - train_test_split の 1 つのコマンドを使用して、データセットのマルチスプリットを作成します

1 に答える 1

Related

Reference