java - 文字数でファイルサイズを決定する方法は？

Question

Windowsでjavaとjcifsを使用してファイルを読み取る。マルチバイト文字とASCII文字を含むファイルのサイズを決定する必要があります。

どうすれば効率的にそれを達成できますか、またはJavaの既存のAPIを使用できますか？

ありがとう、

score 2 · Accepted Answer

間違いなく、正確な文字数を取得するには、適切なエンコーディングで読み取る必要があります。問題は、ファイルを効率的に読み取る方法です。Java NIOは、それを行うための最も早く知られている方法です。

FileChannel fChannel = new FileInputStream(f).getChannel();
    byte[] barray = new byte[(int) f.length()];
    ByteBuffer bb = ByteBuffer.wrap(barray);
    fChannel.read(bb);

それから

String str = new String(barray, charsetName);
str.length();

バイトバッファへの読み込みは、利用可能な最大速度に近い速度で行われます（私にとっては、60 Mb /秒のようでしたが、ディスク速度テストでは約70〜75 Mb /秒でした）

score 1 · Accepted Answer

文字数を取得するには、ファイルを読み取る必要があります。正しいファイルエンコーディングを指定することにより、Javaがファイル内の各文字を正しく読み取るようにします。

BufferedReader.read（）は、読み取られたUnicode文字を返します（0〜65535の範囲のintとして）。したがって、それを行う簡単な方法は次のようになります。

int countCharsSimple(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    while(reader.read() > -1) {
        charCount++;
    }
    reader.close();
    return charCount;
}

Reader.read（char []）を使用すると、パフォーマンスが向上します。

int countCharsBuffer(File f, String charsetName) throws IOException {
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f), charsetName));
    int charCount = 0;
    char[] cbuf = new char[1024];
    int read = 0;
    while((read = reader.read(cbuf)) > -1) {
        charCount += read;
    }
    reader.close();
    return charCount;
}

興味深いことに、私はこれら2つと、Andreyの回答で提案されたnioバージョンのベンチマークを行いました。上記の2番目の例（countCharsBuffer）が最速であることがわかりました。

（これらの例はすべて、カウントに行区切り文字が含まれていることに注意してください。）

java - 文字数でファイルサイズを決定する方法は？

2 に答える 2

Related

Reference