cmu pocketphinx を使用して簡単な音声認識エンジンをコーディングしようとしていますが、decode_raw()関数に遭遇すると常にクラッシュします。PyPocketSphinx(pipでインストール)を使用して、Windows 7 64ビットでPython 2.7 32ビットを使用しています
これが私のコードです:
from pocketsphinx import Decoder
import sphinxbase
HMM="pocketsphinx-5prealpha-win32/model/en-us/en-us/"
LM="pocketsphinx-5prealpha-win32/model/en-us/en-us.lm.dmp"
DICT="pocketsphinx-5prealpha-win32/model/en-us/cmudict-en-us.dict"
config=Decoder.default_config()
config.set_string("-hmm",HMM)
config.set_string("-lm",LM)
config.set_string('-dict',DICT)
decoder=Decoder(config)
fh=open("output.wav",'rb')
fh.seek(44)
decoder.decode_raw(fh) #Crash
print decoder.get_hyp()
この問題を 1 週間以上解決しようとしましたが、まだ答えが見つかりません。
編集: クラッシュ前に生成されたログ:
INFO: cmd_ln.c(696): Parsing command line:
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-allphone
-allphone_ci no no
-alpha 0.97 9.700000e-001
-ascale 20.0 2.000000e+001
-aw 1 1
-backtrace no no
-beam 1e-48 1.000000e-048
-bestpath yes yes
-bestpathlw 9.5 9.500000e+000
-bghist no no
-ceplen 13 13
-cmn current current
-cmninit 8.0 8.0
-compallsen no no
-debug 0
-dict
-dictcase no no
-dither no no
-doublebw no no
-ds 1 1
-fdict
-feat 1s_c_d_dd 1s_c_d_dd
-featparams
-fillprob 1e-8 1.000000e-008
-frate 100 100
-fsg
-fsgusealtpron yes yes
-fsgusefiller yes yes
-fwdflat yes yes
-fwdflatbeam 1e-64 1.000000e-064
-fwdflatefwid 4 4
-fwdflatlw 8.5 8.500000e+000
-fwdflatsfwin 25 25
-fwdflatwbeam 7e-29 7.000000e-029
-fwdtree yes yes
-hmm
-input_endian little little
-jsgf
-kdmaxbbi -1 -1
-kdmaxdepth 0 0
-kdtree
-keyphrase
-kws
-kws_plp 1e-1 1.000000e-001
-kws_threshold 1 1.000000e+000
-latsize 5000 5000
-lda
-ldadim 0 0
-lextreedump 0 0
-lifter 0 0
-lm
-lmctl
-lmname default default
-logbase 1.0001 1.000100e+000
-logfn
-logspec no no
-lowerf 133.33334 1.333333e+002
-lpbeam 1e-40 1.000000e-040
-lponlybeam 7e-29 7.000000e-029
-lw 6.5 6.500000e+000
-maxhmmpf 10000 10000
-maxnewoov 20 20
-maxwpf -1 -1
-mdef
-mean
-mfclogdir
-min_endfr 0 0
-mixw
-mixwfloor 0.0000001 1.000000e-007
-mllr
-mmap yes yes
-ncep 13 13
-nfft 512 512
-nfilt 40 40
-nwpen 1.0 1.000000e+000
-pbeam 1e-48 1.000000e-048
-pip 1.0 1.000000e+000
-pl_beam 1e-10 1.000000e-010
-pl_pbeam 1e-5 1.000000e-005
-pl_window 0 0
-rawlogdir
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-sendump
-senlogdir
-senmgau
-silprob 0.005 5.000000e-003
-smoothspec no no
-svspec
-tmat
-tmatfloor 0.0001 1.000000e-004
-topn 4 4
-topn_beam 0 0
-toprule
-transform legacy legacy
-unit_area yes yes
-upperf 6855.4976 6.855498e+003
-usewdphones no no
-uw 1.0 1.000000e+000
-vad_postspeech 50 50
-vad_prespeech 10 10
-vad_threshold 2.0 2.000000e+000
-var
-varfloor 0.0001 1.000000e-004
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wbeam 7e-29 7.000000e-029
-wip 0.65 6.500000e-001
-wlen 0.025625 2.562500e-002
INFO: cmd_ln.c(696): Parsing command line:
\
-lowerf 130 \
-upperf 6800 \
-nfilt 25 \
-transform dct \
-lifter 22 \
-feat 1s_c_d_dd \
-svspec 0-12/13-25/26-38 \
-agc none \
-cmn current \
-varnorm no \
-model ptm \
-cmninit 40,3,-1
Current configuration:
[NAME] [DEFLT] [VALUE]
-agc none none
-agcthresh 2.0 2.000000e+000
-alpha 0.97 9.700000e-001
-ceplen 13 13
-cmn current current
-cmninit 8.0 40,3,-1
-dither no no
-doublebw no no
-feat 1s_c_d_dd 1s_c_d_dd
-frate 100 100
-input_endian little little
-lda
-ldadim 0 0
-lifter 0 22
-logspec no no
-lowerf 133.33334 1.300000e+002
-ncep 13 13
-nfft 512 512
-nfilt 40 25
-remove_dc no no
-remove_noise yes yes
-remove_silence yes yes
-round_filters yes yes
-samprate 16000 1.600000e+004
-seed -1 -1
-smoothspec no no
-svspec 0-12/13-25/26-38
-transform legacy dct
-unit_area yes yes
-upperf 6855.4976 6.800000e+003
-vad_postspeech 50 50
-vad_prespeech 10 10
-vad_threshold 2.0 2.000000e+000
-varnorm no no
-verbose no no
-warp_params
-warp_type inverse_linear inverse_linear
-wlen 0.025625 2.562500e-002
INFO: acmod.c(252): Parsed model-specific feature parameters from pocketsphinx-5
prealpha-win32/model/en-us/en-us//feat.params
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13,
CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: pocketsphinx-5prealpha-win32/model/
en-us/en-us//mdef
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef fi
le
INFO: bin_mdef.c(336): Reading binary model definition: pocketsphinx-5prealpha-w
in32/model/en-us/en-us//mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-s
en, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: pocketsphinx-5pr
ealpha-win32/model/en-us/en-us//transition_matrices
INFO: acmod.c(124): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: acmod.c(126): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(294): 128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(467): Loading senones from dump file pocketsphinx-5prealpha-win
32/model/en-us/en-us//sendump
INFO: ptm_mgau.c(491): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(554): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(586): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(826): Maximum top-N: 4
INFO: dict.c(320): Allocating 137526 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: pocketsphinx-5prealpha-win32/model/e
n-us/cmudict-en-us.dict
INFO: dict.c(213): Allocated 1007 KiB for strings, 1662 KiB for phones
INFO: dict.c(336): 133425 words read
INFO: dict.c(342): Reading filler dictionary: pocketsphinx-5prealpha-win32/model
/en-us/en-us//noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial trip
hones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word trip
hones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194
INFO: ngram_model_dmp.c(242): 19794 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288): 1377200 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314): 3178194 = LM.trigrams read
INFO: ngram_model_dmp.c(339): 57155 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359): 10935 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379): 34843 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407): 2690 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463): 19794 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 56 single-phone
words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 56 singl
e-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 44782
INFO: ngram_search_fwdtree.c(339): after: 573 root, 44654 non-root channels, 47
single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25
編集 2: これは、PyPocketsphinx をコンパイルしようとすると、ログ ファイルに生成されます。
setup.py ビルド:
running build
running build_ext
running build_py
copying sphinxbase\swig\python\sphinxbase.py -> build\lib.win32-2.7\sphinxbase
copying pocketsphinx\swig\python\pocketsphinx.py -> build\lib.win32-2.7\pocketsp
hinx
setup.py インストール:
running install
running build_ext
running build
running build_py
copying sphinxbase\swig\python\sphinxbase.py -> build\lib.win32-2.7\sphinxbase
copying pocketsphinx\swig\python\pocketsphinx.py -> build\lib.win32-2.7\pocketsp
hinx
running install_lib
copying build\lib.win32-2.7\pocketsphinx\_pocketsphinx.pyd -> C:\Python27\Lib\si
te-packages\pocketsphinx
copying build\lib.win32-2.7\sphinxbase\_sphinxbase.pyd -> C:\Python27\Lib\site-p
ackages\sphinxbase
running install_egg_info
running egg_info
writing PyPocketSphinx.egg-info\PKG-INFO
writing top-level names to PyPocketSphinx.egg-info\top_level.txt
writing dependency_links to PyPocketSphinx.egg-info\dependency_links.txt
reading manifest file 'PyPocketSphinx.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'build'
no previously-included directories found matching 'dist'
warning: no previously-included files found matching 'sphinxbase\swig\sphinxbase
_wrap.c'
warning: no previously-included files found matching 'pocketsphinx\swig\pocketsp
hinx_wrap.c'
writing manifest file 'PyPocketSphinx.egg-info\SOURCES.txt'
removing 'C:\Python27\Lib\site-packages\PyPocketSphinx-12608.5-py2.7.egg-info' (
and everything under it)
Copying PyPocketSphinx.egg-info to C:\Python27\Lib\site-packages\PyPocketSphinx-
12608.5-py2.7.egg-info
running install_scripts
これらの行に問題があると思います:
warning: no previously-included files found matching 'sphinxbase\swig\sphinxbase_wrap.c'
warning: no previously-included files found matching 'pocketsphinx\swig\pocketsphinx_wrap.c'