api - スピーチでのキーワードスポッティング

Question

無料で利用でき、おそらく API を提供するキーワードスポッティングシステムを知っている人はいますか??

CMU Sphinx 4 および MS Speech API は音声認識エンジンであり、KWS には使用できません。

SRIにはキーワードスポッティングシステムがありますが、ダウンロードリンクはなく、評価用でもありません。（ソフトウェアについて連絡するためのリンクさえどこにも見つかりませんでした）

ここで見つけましたが、これはデモ版で限定版です。

score 4 · Accepted Answer

CMUSphinx は、ポケットフィンクスエンジンにキーワードスポッティングを実装します。詳細については、FAQ エントリを参照してください。

単一のキーフレーズを認識するために、デコーダーを「キーフレーズ検索」モードで実行できます。

コマンドラインから次を試してください：

pocketsphinx_continuous -infile file.wav -keyphrase “oh mighty computer” -kws_threshold 1e-20

コードから：

 ps_set_keyphrase(ps, "keyphrase_search", "oh mighty computer");
 ps_set_search(ps, "keyphrase_search);
 ps_start_utt();
 /* process data */

Python および Android/Java の例もソースで見つけることができます。Python コードは次のようになります。完全な例は次のとおりです。

# Process audio chunk by chunk. On keyphrase detected perform action and restart search
decoder = Decoder(config)
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
         decoder.process_raw(buf, False, False)
    else:
         break
    if decoder.hyp() != None:
        print ([(seg.word, seg.prob, seg.start_frame, seg.end_frame) for seg in decoder.seg()])
        print ("Detected keyphrase, restarting search")
        decoder.end_utt()
        decoder.start_utt()

しきい値は、テストデータのキーフレーズごとに調整して、検出漏れと誤警報の適切なバランスを得る必要があります。1e-5 から 1e-50 のような値を試すことができます。

最高の精度を得るには、3 ～ 4 音節のキーフレーズを使用することをお勧めします。短すぎるフレーズは混乱しやすい。

複数のキーフレーズを検索して、次のようにファイル keyphrase.list を作成することもできます。

  oh mighty computer /1e-40/
  hello world /1e-30/
  other_phrase /other_phrase_threshold/

-kws 構成オプションを使用してデコーダーで使用します。

  pocketsphinx_continuous -inmic yes -kws keyphrase_list

この機能は、sphinx4 デコーダーにはまだ実装されていません。

api - スピーチでのキーワード スポッティング

1 に答える 1

Related

Reference

api - スピーチでのキーワードスポッティング