pocketsphinx - PocketSphinx による音素認識

Question

Windows 8 デスクトップのマイクからのリアルタイムの音素認識が必要です。そこで、 http: //cmusphinx.sourceforge.net/wiki/phonemerecognition に従い、VS2013 の Subversion ソースから pocketphinx_continuous をビルドしました。管理者としてコマンドラインで実行します。

D:\_SPHINX\cmusphinx-code-13103-trunk\pocketsphinx\bin\Release\Win32>pocketsphinx_continuous.exe -infile ../../../test/data/goforward.raw -hmm ../../../model/en-us/en-us -allphone ../../../model/en-us/en-us-phone.lm.bin -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0
INFO: pocketsphinx.c(145): Parsed model-specific feature parameters from ../../../model/en-us/en-us/feat.params
Current configuration:
[NAME]                  [DEFLT]         [VALUE]
-agc                    none            none
-agcthresh              2.0             2.000000e+000
-allphone                               ../../../model/en-us/en-us-phone.lm.bin
-allphone_ci            no              no
-alpha                  0.97            9.700000e-001
-ascale                 20.0            2.000000e+001
-aw                     1               1
-backtrace              no              yes
-beam                   1e-48           1.000000e-020
-bestpath               yes             yes
-bestpathlw             9.5             9.500000e+000
-ceplen                 13              13
-cmn                    current         current
-cmninit                8.0             40,3,-1
-compallsen             no              no
-debug                                  0
-dict
-dictcase               no              no
-dither                 no              no
-doublebw               no              no
-ds                     1               1
-fdict                                  ../../../model/en-us/en-us/noisedict
-feat                   1s_c_d_dd       1s_c_d_dd
-featparams                             ../../../model/en-us/en-us/feat.params
-fillprob               1e-8            1.000000e-008
-frate                  100             100
-fsg
-fsgusealtpron          yes             yes
-fsgusefiller           yes             yes
-fwdflat                yes             yes
-fwdflatbeam            1e-64           1.000000e-064
-fwdflatefwid           4               4
-fwdflatlw              8.5             8.500000e+000
-fwdflatsfwin           25              25
-fwdflatwbeam           7e-29           7.000000e-029
-fwdtree                yes             yes
-hmm                                    ../../../model/en-us/en-us
-input_endian           little          little
-jsgf
-keyphrase
-kws
-kws_delay              10              10
-kws_plp                1e-1            1.000000e-001
-kws_threshold          1               1.000000e+000
-latsize                5000            5000
-lda
-ldadim                 0               0
-lifter                 0               22
-lm
-lmctl
-lmname
-logbase                1.0001          1.000100e+000
-logfn
-logspec                no              no
-lowerf                 133.33334       1.300000e+002
-lpbeam                 1e-40           1.000000e-040
-lponlybeam             7e-29           7.000000e-029
-lw                     6.5             2.000000e+000
-maxhmmpf               30000           30000
-maxwpf                 -1              -1
-mdef                                   ../../../model/en-us/en-us/mdef
-mean                                   ../../../model/en-us/en-us/means
-mfclogdir
-min_endfr              0               0
-mixw
-mixwfloor              0.0000001       1.000000e-007
-mllr
-mmap                   yes             yes
-ncep                   13              13
-nfft                   512             512
-nfilt                  40              25
-nwpen                  1.0             1.000000e+000
-pbeam                  1e-48           1.000000e-020
-pip                    1.0             1.000000e+000
-pl_beam                1e-10           1.000000e-010
-pl_pbeam               1e-10           1.000000e-010
-pl_pip                 1.0             1.000000e+000
-pl_weight              3.0             3.000000e+000
-pl_window              5               5
-rawlogdir
-remove_dc              no              no
-remove_noise           yes             yes
-remove_silence         yes             yes
-round_filters          yes             yes
-samprate               16000           1.600000e+004
-seed                   -1              -1
-sendump                                ../../../model/en-us/en-us/sendump
-senlogdir
-senmgau
-silprob                0.005           5.000000e-003
-smoothspec             no              no
-svspec                                 0-12/13-25/26-38
-tmat                                   ../../../model/en-us/en-us/transition_matrices
-tmatfloor              0.0001          1.000000e-004
-topn                   4               4
-topn_beam              0               0
-toprule
-transform              legacy          dct
-unit_area              yes             yes
-upperf                 6855.4976       6.800000e+003
-uw                     1.0             1.000000e+000
-vad_postspeech         50              50
-vad_prespeech          20              20
-vad_startspeech        10              10
-vad_threshold          2.0             2.000000e+000
-var                                    ../../../model/en-us/en-us/variances
-varfloor               0.0001          1.000000e-004
-varnorm                no              no
-verbose                no              no
-warp_params
-warp_type              inverse_linear  inverse_linear
-wbeam                  7e-29           7.000000e-029
-wip                    0.65            6.500000e-001
-wlen                   0.025625        2.562500e-002

INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(164): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../../model/en-us/en-us/transition_matrices

最後の INFO 行で、Windows 8 は次のエラーをスローします。

PocketSphinx のデバッグ出力またはコマンドラインオプションに問題はありますか? それとも純粋な Windows の問題ですか? 次のフォルダーに気付きました: /bin/Release/Win32. 私の Windows 8 は Intel NUC で 64 ビットです。Sphinxbase.dll は Subversion からデバッグモードでコンパイルされましたが、PacketSphinx にはリリースモードしかありませんでした。

また、音素のタイミング情報が利用できることをどこかで読みました-それを取得する方法は?

追加: Nikolay のアドバイスに従い、これらのパラメーターを使用して、エラーを排除しましたが、音素は得られませんでした:

D:\_SPHINX\pocketsphinx\bin\Debug>pocketsphinx_continuous.exe -infile ../../test/data/goforward.raw -hmm ../../model/en-us/en-us -allphone ../../model/en-us/en-us.lm.dmp -backtrace yes -beam 1e-20 -pbeam 1e-20 -lw 2.0 -debug 3 -verbose yes
INFO: cmd_ln.c(697): Parsing command line:
pocketsphinx_continuous.exe \
        -infile ../../test/data/goforward.raw \
        -hmm ../../model/en-us/en-us \
        -allphone ../../model/en-us/en-us.lm.dmp \
        -backtrace yes \
        -beam 1e-20 \
        -pbeam 1e-20 \
        -lw 2.0 \
        -debug 3 \
        -verbose yes
. . . .
INFO: acmod.c(252): Parsed model-specific feature parameters from ../../model/en-us/en-us/feat.params
INFO: fe_interface.c(177): Current FE Parameters:
INFO: fe_interface.c(178):      Sampling Rate:             16000.000000
INFO: fe_interface.c(179):      Frame Size:                410
INFO: fe_interface.c(180):      Frame Shift:               160
INFO: fe_interface.c(181):      FFT Size:                  512
INFO: fe_interface.c(183):      Lower Frequency:           130
INFO: fe_interface.c(185):      Upper Frequency:           6800
INFO: fe_interface.c(186):      Number of filters:         25
INFO: fe_interface.c(187):      Number of Overflow Samps:  0
INFO: fe_interface.c(188):      Start Utt Status:          0
INFO: fe_interface.c(190): Will not remove DC offset at frame level
INFO: fe_interface.c(196): Will not add dither to audio
INFO: fe_interface.c(200): Will apply sine-curve liftering, period 22
INFO: fe_interface.c(203): Will normalize filters to unit area
INFO: fe_interface.c(205): Will round filter frequencies to DFT points
INFO: fe_interface.c(207): Will not use double bandwidth in mel filter
INFO: feat.c(715): Initializing feature stream to type: '1s_c_d_dd', ceplen=13, CMN='current', VARNORM='no', AGC='none'
INFO: cmn.c(143): mean[0]= 12.00, mean[1..12]= 0.0
INFO: acmod.c(171): Using subvector specification 0-12/13-25/26-38
INFO: mdef.c(518): Reading model definition: ../../model/en-us/en-us/mdef
INFO: mdef.c(531): Found byte-order mark BMDF, assuming this is a binary mdef file
INFO: bin_mdef.c(336): Reading binary model definition: ../../model/en-us/en-us/mdef
INFO: bin_mdef.c(516): 42 CI-phone, 137053 CD-phone, 3 emitstate/phone, 126 CI-sen, 5126 Sen, 29324 Sen-Seq
INFO: tmat.c(206): Reading HMM transition probability matrices: ../../model/en-us/en-us/transition_matrices
INFO: acmod.c(124): Attempting to use PTM computation module
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/means
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(198): Reading mixture gaussian parameter: ../../model/en-us/en-us/variances
INFO: ms_gauden.c(292): 42 codebook, 3 feature, size:
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(294):  128x13
INFO: ms_gauden.c(354): 222 variance values floored
INFO: ptm_mgau.c(476): Loading senones from dump file ../../model/en-us/en-us/sendump
INFO: ptm_mgau.c(500): BEGIN FILE FORMAT DESCRIPTION
INFO: ptm_mgau.c(563): Rows: 128, Columns: 5126
INFO: ptm_mgau.c(595): Using memory-mapped I/O for senones
INFO: ptm_mgau.c(835): Maximum top-N: 4
INFO: phone_loop_search.c(115): State beam -225 Phone exit beam -225 Insertion penalty 0
INFO: dict.c(320): Allocating 4101 * 20 bytes (80 KiB) for word entries
INFO: dict.c(342): Reading filler dictionary: ../../model/en-us/en-us/noisedict
INFO: dict.c(213): Allocated 0 KiB for strings, 0 KiB for phones
INFO: dict.c(345): 5 words read
INFO: dict2pid.c(396): Building PID tables for dictionary
INFO: dict2pid.c(406): Allocating 42^3 * 2 bytes (144 KiB) for word-initial triphones
INFO: dict2pid.c(132): Allocated 21336 bytes (20 KiB) for word-final triphones
INFO: dict2pid.c(196): Allocated 21336 bytes (20 KiB) for single-phone word triphones
INFO: ngram_model_arpa.c(77): No \data\ mark in LM file
INFO: ngram_model_dmp.c(142): Will use memory-mapped I/O for LM file
INFO: ngram_model_dmp.c(196): ngrams 1=19794, 2=1377200, 3=3178194
INFO: ngram_model_dmp.c(242):    19794 = LM.unigrams(+trailer) read
INFO: ngram_model_dmp.c(288):  1377200 = LM.bigrams(+trailer) read
INFO: ngram_model_dmp.c(314):  3178194 = LM.trigrams read
INFO: ngram_model_dmp.c(339):    57155 = LM.prob2 entries read
INFO: ngram_model_dmp.c(359):    10935 = LM.bo_wt2 entries read
INFO: ngram_model_dmp.c(379):    34843 = LM.prob3 entries read
INFO: ngram_model_dmp.c(407):     2690 = LM.tseg_base entries read
INFO: ngram_model_dmp.c(463):    19794 = ascii word strings read
INFO: allphone_search.c(239): Building PHMM net of 137095 phones
INFO: allphone_search.c(312): 29324 nodes, 1958591 links
INFO: allphone_search.c(611): Allphone(beam: -450, pbeam: -450)
INFO: continuous.c(299): pocketsphinx_continuous.exe COMPILED ON: Aug 23 2015, AT: 14:00:33

INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: allphone_search.c(852): 214 frames, 214 HMMs (1/fr), 642 senones (3/fr), 214 history entries (1/fr)
INFO: allphone_search.c(865): allphone 0.61 CPU 0.283 xRT
INFO: allphone_search.c(867): allphone 0.62 wall 0.290 xRT
INFO: allphone_search.c(911): Hyp: SIL
INFO: pocketsphinx.c(1133): SIL (-858993460)
word                 start end   pprob ascr       lscr       lback
SIL                  51    264   1.000 -1627      0          0
INFO: allphone_search.c(911): Hyp: SIL
SIL
INFO: cmn_prior.c(131): cmn_prior_update: from < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: cmn_prior.c(149): cmn_prior_update: to   < 44.50 -4.13  0.15  6.94  4.06 -5.38 -2.56 -3.13 -6.12 -1.20 -7.44 -2.25  0.48 >
INFO: allphone_search.c(852): 0 frames, 0 HMMs (0/fr), 0 senones (0/fr), 0 history entries (0/fr)
INFO: allphone_search.c(651): TOTAL fwdflat 0.61 CPU 0.285 xRT
INFO: allphone_search.c(654): TOTAL fwdflat 0.64 wall 0.298 xRT

音素出力を取得するためのコマンドラインパラメータの正しいセットは何ですか?

pocketsphinx - PocketSphinx による音素認識

0 に答える 0

Related

Reference