0

I have done two algorithms and I want to check which one of them is more 'efficient' and uses less memory. The first one creates a numpy array and modifies the array. The second one creates a python empty array and pushes values into this array. Who's better? First program:

 f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        lines = f.readlines()
        f.close()
        zeros = np.zeros((60343,4917))

        for l in lines:
            row = l.split(",")
            for element in row:
                zeros[lines.index(l), row.index(element)] = element

        X = zeros[1,:]
        Y = zeros[:,0]
        one_hot = np.ones((counter, 2))

The second one:

 f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
        lines = f.readlines()
        f.close()
        X = []
        Y = []

        for l in lines:
            row = l.split(",")
            X.append([float(elem) for elem in row[1:]])
            Y.append(float(row[0]))

        X = np.array(X)
        Y = np.array(Y)
        one_hot = np.ones((counter, 2))

My theory is that the first one is slower but uses less memory and it's more 'stable' while working with large files. The second one it's faster but uses a lot of memory and its not so stable while working with large files (543MB, 70,000 lines)

Thanks!

4

3 に答える 3

0

Python のデフォルト ライブラリには便利なプロファイラーがあります。使い方はとても簡単です。コードを関数で囲み、次のように cProfile.run を呼び出すだけです。

import cProfile
cProfile.run('my_function()')

両方のケースに対する 1 つのアドバイス: リストのすべての行を実際に読み取る必要はありません。代わりに、ファイルを反復処理するだけで、メモリに保存せずに行を取得できます。

f = open('some_file.txt')
for line in f:
    # Do something

メモリ使用量に関しては、numpy 配列はリストよりもはるかに優れています。

于 2013-07-31T13:13:42.207 に答える