lucene - Lucene スペルチェッカー 3.6 文字セット

Question

lucene スペルチェッカー (コア lucene とスペルチェッカーの両方でバージョン 3.6) の文字セットの設定についてサポートが必要です。私の辞書 ("D:\dictionary.txt") には、英語とロシア語の両方の単語が含まれています。私のコードは英語のテキストでうまく機能します。たとえば、単語「hello」の正しいスペルが返されます。ただし、ロシア語では機能しません。たとえば、ロシア語の単語のスペルを間違えると、コンパイラは例外 (スレッド "main" java.lang.ArrayIndexOutOfBoundsException: 0 の例外) を発生させ、ロシア語の単語の候補を見つけることができません。

これが私のコードです：

        RAMDirectory spellCheckerDir = new RAMDirectory();
        SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
        IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
        InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("D:\\dictionary.txt")), "UTF-8");
        PlainTextDictionary dictionary = new PlainTextDictionary(isr);
        spellChecker.indexDictionary(dictionary, config, true);
        suggestions = spellChecker.suggestSimilar("hwllo", 1); // word 'hello' is misspeled like 'hwllo'

score 0 · Accepted Answer

あなたのコードに基づいて私が思いつくことができる最良のオプション（それは役に立ちました、10倍）。2 つの辞書を別々にロードしたところ、結合されたファイルでも機能するはずです。

    RAMDirectory spellCheckerDir = new RAMDirectory();
    SpellChecker spellChecker = new SpellChecker(spellCheckerDir);
    StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_44);
    IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_44, analyzer);
    InputStreamReader isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/English/words.english")), "UTF-8");
    PlainTextDictionary dictionary = new PlainTextDictionary(isr);
    spellChecker.indexDictionary(dictionary, config, true);
    isr = new InputStreamReader(new FileInputStream(new File("d:/dictionaries/Swedish/words.swedish")), "UTF-8");
    PlainTextDictionary swdictionary = new PlainTextDictionary(isr);
    spellChecker.indexDictionary(swdictionary, config, true);
    String wordForSuggestions = "hwllo";
    int suggestionsNumber = 5;

    String[] suggestions = spellChecker.suggestSimilar("hwllo", suggestionsNumber); // word 'hello' is misspeled like 'hwllo'

lucene - Lucene スペル チェッカー 3.6 文字セット

1 に答える 1

Related

Reference

lucene - Lucene スペルチェッカー 3.6 文字セット