ruby - 音声文字変換の精度を確認する/距離を編集するためのスクリプトの擬似コード

Question

おそらくRubyでスクリプトを作成する必要があります。このスクリプトは、1ブロックのテキストを取得し、そのテキストの録音の多数の文字起こしを元のテキストと比較して、正確性を確認します。それが完全に混乱している場合は、別の方法で説明してみます...

数文の長さの台本を読んでいる何人かの異なる人々の録音があります。これらの録音はすべて、他の人によって何度もテキストに書き戻されています。私はすべての文字起こし（数百）を取り、それらを元のスクリプトと比較して正確にする必要があります。

擬似コードを概念化することすら問題があり、誰かが私を正しい方向に向けることができるかどうか疑問に思っています。検討すべき確立されたアルゴリズムはありますか？レーベンシュタイン距離が提案されましたが、句読点の選択や空白などの違いを考慮すると、これは長い文字列にはうまく対応できないようです。最初の単語が欠落していると、たとえ1つおきの単語であっても、アルゴリズム全体が破壊されます。完璧だった。私は何にでもオープンです-ありがとう！

編集：

ヒントをありがとう、psyho。ただし、私の最大の懸念事項の1つは、次のような状況です。

元のテキスト：

I would've taken that course if I'd known it was available!

転写

I would have taken that course if I'd known it was available!

Even with a word-wise comparison of tokens, this transcription will be marked as quite errant, even though it's almost perfect, and this is hardly an edge-case! "would've" and "would have" are commonly pronounced extremely similarly, especially in this part of the world. Is there a way to make the approach you suggest robust enough to deal with this? I've thought about running a word-wise comparison both forward and backward and building a sort of composite score, but this would fall apart with a transcription like this:

I would have taken that course if I had known it was available!

Any ideas?

score 1 · Accepted Answer

Simple version:

Tokenize your input into words (convert a string containing words, punctuation, etc. into an array of lowercase words, without punctuation).
Use the Levenshtein distance (wordwise) to compare the original array with the transcription arrays.

Possible improvements:

You could introduce tokens for punctuation (or replace them all with a simple token like '.').
Levenshtein distance algorithm can be modified so that misspelling a character that with a character that is close on the keyboard generates a smaller distance. You could potentialy apply this, so that when comparing individual words, you would use Levenshtein distance (normalized, so that it's value ranges from 0 to 1, for example by dividing it by the length of the longer of the two words), and then use that value in the "outer" distance calculation.

It's hard to say what algorithm will work best with your data. My tip is: make sure you have some automated way of visualizing or testing your solution. This way you can quickly iterate and experiment with your solution and see how your changes affect the end result.

EDIT: In response to your concerns:

The easiest way would be to start with normalizing the shorter forms (using gsub):

str.gsub("n't", ' not').gsub("'d", " had").gsub("'re", " are")

Note, that you can even expand "'s" to " is", even if it's not grammatically correct, because if John's means "John is", then you will get it right, and if it means "owned by John", then most likely both texts will contain the same form, so you will not further the distance by expanding both "incorrectly". The other case is when it should mean "John has", but then after "'s" there probably will be "got", so you can handle that easily as well.

You will probably also want to deal with numeric values (1st = first, etc.). Generally you can probably improve the result by doing some preprocessing. Don't worry if it's not always 100% correct, it should just be correct enough:)

score 0 · Accepted Answer

最終的には、さまざまな転写者がパッセージの音にどのように対処したかを比較しようとしているので、 Metaphoneなどの音声アルゴリズムを使用して比較してみてください。

score 0 · Accepted Answer

この質問で指摘した問題を実験した後、レーベンシュタイン距離が実際にこれらの問題を考慮に入れていることがわかりました。その方法や理由は完全には理解できませんが、実験の結果、これが事実であることがわかります。

ruby - 音声文字変換の精度を確認する/距離を編集するためのスクリプトの擬似コード

編集：

元のテキスト：

転写

3 に答える 3

Related

Reference