python - scikit の大規模なデータセットの ICA オンライン学習

Question

大規模なデータセットがあり、画像からガボールフィルターを取得しようとしています。データセットが大きくなりすぎると、メモリエラーが発生します。これまでのところ、私はこのコードを持っています:

import numpy
from sklearn.feature_extraction.image import extract_patches_2d
from sklearn.decomposition import MiniBatchDictionaryLearning
from sklearn.decomposition import FastICA

def extract_dictionary(image, patches_size=(16,16), projection_dimensios=25, previous_dictionary=None):
    """
    Gets a higher dimension ica projection image.

    """
    patches = extract_patches_2d(image, patches_size)
    patches = numpy.reshape(patches, (patches.shape[0],-1))[:LIMIT]
    patches -= patches.mean(axis=0)
    patches /= numpy.std(patches, axis=0)
    #dico = MiniBatchDictionaryLearning(n_atoms=projection_dimensios, alpha=1, n_iter=500)
    #fit = dico.fit(patches)
    ica = FastICA(n_components=projection_dimensios)
    ica.fit(patches)

    return ica

LIMIT が大きい場合、メモリエラーがあります。scikit または他の python パッケージに ICA のオンライン (インクリメンタル) 代替手段はありますか?

score 4 · Accepted Answer

いいえ、ありません。本当にICAフィルターが必要ですか？試しましたがMiniBatchDictionaryLearning、MiniBatchKMeans代わりにオンラインですか？

Also, although not strictly online RandomizedPCA is able to address medium to largish data if the number of components to extract is small.

python - scikit の大規模なデータセットの ICA オンライン学習

1 に答える 1

Related

Reference