java - Java 並列ファイル処理

Question

私は次のコードを持っています:

import java.io.*;
import java.util.concurrent.* ;
public class Example{
public static void main(String args[]) {
    try {
        FileOutputStream fos = new FileOutputStream("1.dat");
        DataOutputStream dos = new DataOutputStream(fos);

        for (int i = 0; i < 200000; i++) {
            dos.writeInt(i);
        }
        dos.close();                                                         // Two sample files created

        FileOutputStream fos1 = new FileOutputStream("2.dat");
        DataOutputStream dos1 = new DataOutputStream(fos1);

        for (int i = 200000; i < 400000; i++) {
            dos1.writeInt(i);
        }
        dos1.close();

        Exampless.createArray(200000); //Create a shared array
        Exampless ex1 = new Exampless("1.dat");
        Exampless ex2 = new Exampless("2.dat");
        ExecutorService executor = Executors.newFixedThreadPool(2); //Exexuted parallaly to cont number of matches in two file
        long startTime = System.nanoTime();
        long endTime;
        Future<Integer> future1 = executor.submit(ex1);
        Future<Integer> future2 = executor.submit(ex2);
        int count1 = future1.get();
        int count2 = future2.get();
        endTime = System.nanoTime();
        long duration = endTime - startTime;
        System.out.println("duration with threads:"+duration);
        executor.shutdown();
        System.out.println("Matches: " + (count1 + count2));

        startTime = System.nanoTime();
        ex1.call();
        ex2.call();
        endTime = System.nanoTime();
        duration = endTime - startTime;
        System.out.println("duration without threads:"+duration);

    } catch (Exception e) {
        System.err.println("Error: " + e.getMessage());
    }
}
}

class Exampless implements Callable {

public static int[] arr = new int[20000];
public String _name;

public Exampless(String name) {
    this._name = name;
}

static void createArray(int z) {
    for (int i = z; i < z + 20000; i++) { //shared array
        arr[i - z] = i;
    }
}

public Object call() {
    try {
        int cnt = 0;
        FileInputStream fin = new FileInputStream(_name);
        DataInputStream din = new DataInputStream(fin);      // read file and calculate number of matches
        for (int i = 0; i < 20000; i++) {
            int c = din.readInt();
            if (c == arr[i]) {
                cnt++;
            }
        }
        return cnt ;
    } catch (Exception e) {
        System.err.println("Error: " + e.getMessage());
    }
    return -1 ;
}

}

2 つのファイルを持つ配列内の一致数を数えようとしているところ。現在、2 つのスレッドで実行していますが、次の理由でコードがうまく機能していません。

(シングルスレッドで実行、ファイル 1 + ファイル 2 の読み取り時間) < (ファイル 1 || マルチスレッドでのファイル 2 の読み取り時間)。

誰でもこれを解決する方法を教えてもらえますか (私は 2 コアの CPU を使用しており、ファイルサイズは約 1.5 GB です)。

score 7 · Accepted Answer

最初のケースでは、バイトごと、ブロックごとに1つのファイルを順番に読み取っています。これは、ファイルがあまり断片化されていない限り、ディスクI/Oと同じくらい高速です。最初のファイルの処理が完了すると、disk / OSは2番目のファイルの先頭を検出し、ディスクの非常に効率的な線形読み取りを続行します。

2番目のケースでは、常に1番目と2番目のファイルを切り替えて、ディスクをある場所から別の場所にシークさせます。この余分なシーク時間（約10ミリ秒）が混乱の原因です。

ああ、ディスクアクセスはシングルスレッドであり、タスクはI / Oバウンドであるため、同じ物理ディスクから読み取る限り、このタスクを複数のスレッドに分割する方法はありません。あなたのアプローチは、次の場合にのみ正当化できます。

各スレッドは、ファイルからの読み取りを除いて、CPUを集中的に使用する操作やブロック操作も実行し、I/Oと比較して桁違いに遅くなりました。
ファイルが異なる物理ドライブ（異なるパーティションでは不十分）または一部のRAID構成にある
SSDドライブを使用しています

score 1 · Accepted Answer

Tomasz がディスクからのデータの読み取りから指摘したように、マルチスレッド化の利点は得られません。チェックをマルチスレッド化すると、速度が向上する場合があります。つまり、ファイルから配列にデータを順番にロードし、スレッドが並列にチェックを実行します。しかし、ファイルのサイズが小さいこと (~80kb) と、単に int を比較しているだけであるという事実を考慮すると、パフォーマンスの向上に努力する価値があるとは思えません。

実行速度が確実に向上するのは、readInt() を使用しない場合です。20000 int を比較していることはわかっているので、readInt() 関数を 20000 回呼び出すのではなく、ファイルごとに (または少なくともブロック単位で) 20000 int すべてを一度に配列に読み込む必要があります。

java - Java 並列ファイル処理

2 に答える 2

Related

Reference