1

cmu pocketphinx を使用して簡単な音声認識エンジンをコーディングしようとしていますが、decode_raw()関数に遭遇すると常にクラッシュします。PyPocketSphinx(pipでインストール)を使用して、Windows 7 64ビットでPython 2.7 32ビットを使用しています

これが私のコードです:

from pocketsphinx import Decoder
import sphinxbase

HMM="pocketsphinx-5prealpha-win32/model/en-us/en-us/"
LM="pocketsphinx-5prealpha-win32/model/en-us/en-us.lm.dmp"
DICT="pocketsphinx-5prealpha-win32/model/en-us/cmudict-en-us.dict"

config=Decoder.default_config()
config.set_string("-hmm",HMM)
config.set_string("-lm",LM)
config.set_string('-dict',DICT)
decoder=Decoder(config)

fh=open("output.wav",'rb')
fh.seek(44)

decoder.decode_raw(fh) #Crash
print decoder.get_hyp()

この問題を 1 週間以上解決しようとしましたが、まだ答えが見つかりません。

編集: クラッシュ前に生成されたログ:

INFO: cmd_ln.c(696): Parsing command line:


Current configuration:
[NAME]          [DEFLT]         [VALUE]
-agc            none            none
-agcthresh      2.0             2.000000e+000
-allphone
-allphone_ci    no              no
-alpha          0.97            9.700000e-001
-ascale         20.0            2.000000e+001
-aw             1               1
-backtrace      no              no
-beam           1e-48           1.000000e-048
-bestpath       yes             yes
-bestpathlw     9.5             9.500000e+000
-bghist         no              no
-ceplen         13              13
-cmn            current         current
-cmninit        8.0             8.0
-compallsen     no              no
-debug                          0
-dict
-dictcase       no              no
-dither         no              no
-doublebw       no              no
-ds             1               1
-fdict
-feat           1s_c_d_dd       1s_c_d_dd
-featparams
-fillprob       1e-8            1.000000e-008
-frate          100             100
-fsg
-fsgusealtpron  yes             yes
-fsgusefiller   yes             yes
-fwdflat        yes             yes
-fwdflatbeam    1e-64           1.000000e-064
-fwdflatefwid   4               4
-fwdflatlw      8.5             8.500000e+000
-fwdflatsfwin   25              25
-fwdflatwbeam   7e-29           7.000000e-029
-fwdtree        yes             yes
-hmm
-input_endian   little          little
-jsgf
-kdmaxbbi       -1              -1
-kdmaxdepth     0               0
-kdtree
-keyphrase
-kws
-kws_plp        1e-1            1.000000e-001
-kws_threshold  1               1.000000e+000
-latsize        5000            5000
-lda
-ldadim         0               0
-lextreedump    0               0
-lifter         0               0
-lm
-lmctl
-lmname         default         default
-logbase        1.0001          1.000100e+000
-logfn
-logspec        no              no
-lowerf         133.33334       1.333333e+002
-lpbeam         1e-40           1.000000e-040
-lponlybeam     7e-29           7.000000e-029
-lw             6.5             6.500000e+000
-maxhmmpf       10000           10000
-maxnewoov      20              20
-maxwpf         -1              -1
-mdef
-mean
-mfclogdir
-min_endfr      0               0
-mixw
-mixwfloor      0.0000001       1.000000e-007
-mllr
-mmap           yes             yes
-ncep           13              13
-nfft           512             512
-nfilt          40              40
-nwpen          1.0             1.000000e+000
-pbeam          1e-48           1.000000e-048
-pip            1.0             1.000000e+000
-pl_beam        1e-10           1.000000e-010
-pl_pbeam       1e-5            1.000000e-005
-pl_window      0               0
-rawlogdir
-remove_dc      no              no
-remove_noise   yes             yes
-remove_silence yes             yes
-round_filters  yes             yes
-samprate       16000           1.600000e+004
-seed           -1              -1
-sendump
-senlogdir
-senmgau
-silprob        0.005           5.000000e-003
-smoothspec     no              no
-svspec
-tmat
-tmatfloor      0.0001          1.000000e-004
-topn           4               4
-topn_beam      0               0
-toprule
-transform      legacy          legacy
-unit_area      yes             yes
-upperf         6855.4976       6.855498e+003
-usewdphones    no              no
-uw             1.0             1.000000e+000
-vad_postspeech 50              50
-vad_prespeech  10              10
-vad_threshold  2.0             2.000000e+000
-var
-varfloor       0.0001          1.000000e-004
-varnorm        no              no
-verbose        no              no
-warp_params
-warp_type      inverse_linear  inverse_linear
-wbeam          7e-29           7.000000e-029
-wip            0.65            6.500000e-001
-wlen           0.025625        2.562500e-002

INFO: cmd_ln.c(696): Parsing command line:
\
        -lowerf 130 \
        -upperf 6800 \
        -nfilt 25 \
        -transform dct \
        -lifter 22 \
        -feat 1s_c_d_dd \
        -svspec 0-12/13-25/26-38 \
        -agc none \
        -cmn current \
        -varnorm no \
        -model ptm \
        -cmninit 40,3,-1

Current configuration:
[NAME]          [DEFLT]         [VALUE]
-agc            none            none
-agcthresh      2.0             2.000000e+000
-alpha          0.97            9.700000e-001
-ceplen         13              13
-cmn            current         current
-cmninit        8.0             40,3,-1
-dither         no              no
-doublebw       no              no
-feat           1s_c_d_dd       1s_c_d_dd
-frate          100             100
-input_endian   little          little
-lda
-ldadim         0               0
-lifter         0               22
-logspec        no              no
-lowerf         133.33334       1.300000e+002
-ncep           13              13
-nfft           512             512
-nfilt          40              25
-remove_dc      no              no
-remove_noise   yes             yes
-remove_silence yes             yes
-round_filters  yes             yes
-samprate       16000           1.600000e+004
-seed           -1              -1
-smoothspec     no              no
-svspec                         0-12/13-25/26-38
-transform      legacy          dct
-unit_area      yes             yes
-upperf         6855.4976       6.800000e+003
-vad_postspeech 50              50
-vad_prespeech  10              10
-vad_threshold  2.0             2.000000e+000
-varnorm        no              no
-verbose        no              no
-warp_params
-warp_type      inverse_linear  inverse_linear
-wlen           0.025625        2.562500e-002

INFO: acmod.c(252): Parsed model-specific feature parameters from pocketsphinx-5
prealpha-win32/model/en-us/en-us//feat.params
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13,
CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(517): Reading model definition: pocketsphinx-5prealpha-win32/model/
en-us/en-us//mdef
INFO: mdef.c(530): Found byte-order mark BMDF, assuming this is a binary mdef fi
le
INFO: bin_mdef.c(336): Reading binary model definition: pocketsphinx-5prealpha-w
in32/model/en-us/en-us//mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-s
en, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: pocketsphinx-5pr
ealpha-win32/model/en-us/en-us//transition_matrices
INFO: acmod.c(124): Attempting to use SCHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: acmod.c(126): Attempting to use PTHMM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: pocketsphinx-5prealp
ha-win32/model/en-us/en-us//variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(467): Loading senones from dump file pocketsphinx-5prealpha-win
32/model/en-us/en-us//sendump
INFO: ptm_mgau.c(491): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(554): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(586): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(826): Maximum top-N: 4
INFO: dict.c(320): Allocating 137526 * 20 bytes (2686 KiB) for word entries
INFO: dict.c(333): Reading main dictionary: pocketsphinx-5prealpha-win32/model/e
n-us/cmudict-en-us.dict
INFO: dict.c(213): Allocated 1007 KiB for strings, 1662 KiB for phones
INFO: dict.c(336): 133425 words read
INFO: dict.c(342): Reading filler dictionary: pocketsphinx-5prealpha-win32/model
/en-us/en-us//noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial trip
hones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word trip
hones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194
INFO: ngram_model_dmp.c(242):    19794 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):  1377200 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):  3178194 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    57155 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    10935 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    34843 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):     2690 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):    19794 = ascii word strings read
INFO: ngram_search_fwdtree.c(99): 788 unique initial diphones
INFO: ngram_search_fwdtree.c(148): 0 root, 0 non-root channels, 56 single-phone
words
INFO: ngram_search_fwdtree.c(186): Creating search tree
INFO: ngram_search_fwdtree.c(192): before: 0 root, 0 non-root channels, 56 singl
e-phone words
INFO: ngram_search_fwdtree.c(326): after: max nonroot chan increased to 44782
INFO: ngram_search_fwdtree.c(339): after: 573 root, 44654 non-root channels, 47
single-phone words
INFO: ngram_search_fwdflat.c(157): fwdflat: min_ef_width = 4, max_sf_win = 25

編集 2: これは、PyPocketsphinx をコンパイルしようとすると、ログ ファイルに生成されます。

setup.py ビルド:

running build
running build_ext
running build_py
copying sphinxbase\swig\python\sphinxbase.py -> build\lib.win32-2.7\sphinxbase
copying pocketsphinx\swig\python\pocketsphinx.py -> build\lib.win32-2.7\pocketsp
hinx

setup.py インストール:

running install
running build_ext
running build
running build_py
copying sphinxbase\swig\python\sphinxbase.py -> build\lib.win32-2.7\sphinxbase
copying pocketsphinx\swig\python\pocketsphinx.py -> build\lib.win32-2.7\pocketsp
hinx
running install_lib
copying build\lib.win32-2.7\pocketsphinx\_pocketsphinx.pyd -> C:\Python27\Lib\si
te-packages\pocketsphinx
copying build\lib.win32-2.7\sphinxbase\_sphinxbase.pyd -> C:\Python27\Lib\site-p
ackages\sphinxbase
running install_egg_info
running egg_info
writing PyPocketSphinx.egg-info\PKG-INFO
writing top-level names to PyPocketSphinx.egg-info\top_level.txt
writing dependency_links to PyPocketSphinx.egg-info\dependency_links.txt
reading manifest file 'PyPocketSphinx.egg-info\SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'build'
no previously-included directories found matching 'dist'
warning: no previously-included files found matching 'sphinxbase\swig\sphinxbase
_wrap.c'
warning: no previously-included files found matching 'pocketsphinx\swig\pocketsp
hinx_wrap.c'
writing manifest file 'PyPocketSphinx.egg-info\SOURCES.txt'
removing 'C:\Python27\Lib\site-packages\PyPocketSphinx-12608.5-py2.7.egg-info' (
and everything under it)
Copying PyPocketSphinx.egg-info to C:\Python27\Lib\site-packages\PyPocketSphinx-
12608.5-py2.7.egg-info
running install_scripts

これらの行に問題があると思います:

warning: no previously-included files found matching 'sphinxbase\swig\sphinxbase_wrap.c'
warning: no previously-included files found matching 'pocketsphinx\swig\pocketsphinx_wrap.c'
4

0 に答える 0