2

Googleコードに存在しない新しい言語をトレーニングしていて、単語リストを作成している段階ですが、リストが何であるかを説明していません....つまり、トレーニングtiffの単語のリストです画像ですか、それとも言語全体の単語のリストですか?

4

1 に答える 1

2

From the documentation:

Tesseract uses up to 8 dictionary files for each language. These are all optional, and help Tesseract to decide the likelihood of different possible character combinations.

There are various kinds of dictionaries, you can ignore them at the beginning.

One of the dictionaries is supposed to contain almost all the words, while on of the others is supposed to contain the most popular words. The remaining ones contain other things.

If I knew what language you are creating training data for, I could give some pointers.

But to reiterate: you don't need any of them.

See the relevant part of the documentation

于 2013-12-10T00:01:59.037 に答える