java - Javaを使用してファイル内の単語を検索する方法

Question

辞書内の単語のリストを含むテキストファイル内の単語を検索する Java プログラムを作成しています。このファイルには約 300,000 語が含まれています。各単語を入力単語 (検索している単語) と比較して単語を反復処理できるプログラムを思いつくことができました。問題は、単語が x、y、z のような最後のアルファベットで始まる場合、このプロセスが単語を見つけるのに非常に時間がかかることです。ほぼ瞬時に単語を見つけることができる、より効率的なものが必要です。これが私のコードです：

import java.io.IOException;
import java.io.InputStreamReader;

public class ReadFile
{
public static void main(String[] args) throws IOException
{
    ReadFile rf = new ReadFile();
    rf.searchWord(args[0]);
}

private void searchWord(String token) throws IOException
{
    InputStreamReader reader = new InputStreamReader(
            getClass().getResourceAsStream("sowpods.txt"));
    String line = null;
    // Read a single line from the file. null represents the EOF.
    while((line = readLine(reader)) != null && !line.equals(token))
    {
        System.out.println(line);
    }

    if(line != null && line.equals(token))
    {
        System.out.println(token + " WAS FOUND.");
    }
    else if(line != null && !line.equals(token))
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    else
    {
        System.out.println(token + " WAS NOT FOUND.");
    }
    reader.close();
}

private String readLine(InputStreamReader reader) throws IOException
{
    // Test whether the end of file has been reached. If so, return null.
    int readChar = reader.read();
    if(readChar == -1)
    {
        return null;
    }
    StringBuffer string = new StringBuffer("");
    // Read until end of file or new line
    while(readChar != -1 && readChar != '\n')
    {
        // Append the read character to the string. Some operating systems
        // such as Microsoft Windows prepend newline character ('\n') with
        // carriage return ('\r'). This is part of the newline character
        // and therefore an exception that should not be appended to the
        // string.
        if(readChar != '\r')
        {
            string.append((char) readChar);
        }
        // Read the next character
        readChar = reader.read();
    }
    return string.toString();
}

}

また、このプログラムを Java ME 環境で使用したいと考えています。Jevison7x に感謝します。

score 1 · Accepted Answer

使用できますfgrep（tofgrepによってアクティブ化されます）（fgrepのLinuxマニュアルページ）：-Fgrep

grep -F -f dictionary.txt inputfile.txt

辞書ファイルには、各行に1つの単語が含まれている必要があります。

それでも正確かどうかはわかりませんが、grepに関するウィキペディアの記事では、での Aho-Corasickアルゴリズムの使用について言及しています。これは、固定辞書fgrepに基づいてオートマトンを構築し、文字列をすばやく照合するアルゴリズムです。

とにかく、ウィキペディアの有限のパターンセットで文字列検索アルゴリズムのリストを見ることができます。これらは、辞書で単語を検索するときに使用するより効率的なものです。

java - Javaを使用してファイル内の単語を検索する方法

1 に答える 1

Related

Reference