手順に従いましたhttps://www.kaggle.com/c/otto-group-product-classification-challenge/forums/t/13973/a-few-tips-to-install-theano-on-windows-64 -bits を使用して Windows マシンに theano をインストールし、theano.misc.check_blas.test() を実行して blas の速度をテストすると、約 10 秒で動作します。
In [2]: theano.misc.check_blas.test()
Some Theano flags:
blas.ldflags= -LC:\\openblas -lopenblas
compiledir= C:\Users\WAWEIMIN\AppData\Local\Theano\compiledir_Windows-7-6.1.
7601-SP1-Intel64_Family_6_Model_61_Stepping_4_GenuineIntel-3.4.3-64
floatX= float64
device= cpu
Some OS information:
sys.platform= win32
sys.version= 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Mar 6 2015, 12:06:10
) [MSC v.1600 64 bit (AMD64)]
sys.prefix= C:\Users\WAWEIMIN\SciSoft\Anaconda
Some environment variables:
MKL_NUM_THREADS= None
OMP_NUM_THREADS= None
GOTO_NUM_THREADS= None
Numpy config: (used when the Theano flag "blas.ldflags" is empty)
lapack_opt_info:
define_macros = [('SCIPY_MKL_H', None)]
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
blas_opt_info:
define_macros = [('SCIPY_MKL_H', None)]
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
openblas_lapack_info:
NOT AVAILABLE
blas_mkl_info:
define_macros = [('SCIPY_MKL_H', None)]
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
lapack_mkl_info:
define_macros = [('SCIPY_MKL_H', None)]
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd', 'mkl_lapack95_lp64', 'mk
l_blas95_lp64', 'mkl_intel_lp64', 'mkl_intel_thread', 'mkl_core', 'libiomp5md',
'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
mkl_info:
define_macros = [('SCIPY_MKL_H', None)]
library_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/lib/intel64']
libraries = ['mkl_lapack95_lp64', 'mkl_blas95_lp64', 'mkl_intel_lp64', 'mkl_
intel_thread', 'mkl_core', 'libiomp5md', 'libifportmd']
include_dirs = ['C:/Program Files (x86)/Intel/Composer XE/mkl/include']
Numpy dot module: numpy.core._dotblas
Numpy location: C:\Users\WAWEIMIN\SciSoft\Anaconda\lib\site-packages\numpy\__ini
t__.py
Numpy version: 1.9.2
Out[2]: (9.959995985031128, 'CPU (with direct Theano binding to blas)')
ただし、.theanorc.txt ファイルからこれらの行を取り出すと、
[blas]
ldflags=-LC:\\openblas -lopenblas
結果は次のようになります (最後の出力行のみを示します)。
(2.91823678434, CPU (without direct Theano binding to blas but with numpy/scipy binding to blas)
Theano の blas への直接バインディングは、直接バインディングがない場合よりもはるかに遅いのはなぜですか?? 私が使っていたブラは間違っていますか?
(上記のリンクの手順に従って、openblas - OpenBLAS-v0.2.14-Win64-int32.zip ( http://sourceforge.net/projects/openblas/files/v0.2.14/OpenBLASからダウンロードできます) をダウンロードして使用します。 -v0.2.14-Win64-int32.zip/download )、ローカル C:\\openblas に保存)
以下のスクリプトを使用してテストも行いました。
import numpy as np
import time
import theano
print('blas.ldflags=', theano.config.blas.ldflags)
A = np.random.rand(1000, 10000).astype(theano.config.floatX)
B = np.random.rand(10000, 1000).astype(theano.config.floatX)
np_start = time.time()
AB = A.dot(B)
np_end = time.time()
X, Y = theano.tensor.matrices('XY')
mf = theano.function([X, Y], X.dot(Y))
t_start = time.time()
tAB = mf(A, B)
t_end = time.time()
print("NP time: %f[s], theano time: %f[s] (times should be close when run on CPU!)" % (
np_end - np_start, t_end - t_start))
print("Result difference: %f" % (np.abs(AB - tAB).max(), ))
また、結果 (openblas へのバインド) も NP よりも遅くなります。
blas.ldflags= -LC:\\openblas -lopenblas
NP time: 0.358800[s], theano time: 1.328000[s] (times should be close when run o
n CPU!)
Result difference: 0.000000