I have done two algorithms and I want to check which one of them is more 'efficient' and uses less memory. The first one creates a numpy array and modifies the array. The second one creates a python empty array and pushes values into this array. Who's better? First program:
f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
lines = f.readlines()
f.close()
zeros = np.zeros((60343,4917))
for l in lines:
row = l.split(",")
for element in row:
zeros[lines.index(l), row.index(element)] = element
X = zeros[1,:]
Y = zeros[:,0]
one_hot = np.ones((counter, 2))
The second one:
f = open('/Users/marcortiz/Documents/vLex/pylearn2/mlearning/classify/files/models/model_training.txt')
lines = f.readlines()
f.close()
X = []
Y = []
for l in lines:
row = l.split(",")
X.append([float(elem) for elem in row[1:]])
Y.append(float(row[0]))
X = np.array(X)
Y = np.array(Y)
one_hot = np.ones((counter, 2))
My theory is that the first one is slower but uses less memory and it's more 'stable' while working with large files. The second one it's faster but uses a lot of memory and its not so stable while working with large files (543MB, 70,000 lines)
Thanks!