java - JavaでHTTP呼び出し用の大きなファイルを処理する

Question

処理する必要のある数百万行のファイルがあります。ファイルの各行は、HTTP呼び出しになります。私は問題を攻撃するための最良の方法を見つけようとしています。

私は明らかにファイルを読み取って順番に呼び出しを行うことができましたが、それは信じられないほど遅くなります。呼び出しを並列化したいのですが、ファイル全体をメモリに読み込む必要があるのか（私はあまり好きではありません）、ファイルの読み取りも並列化する必要があるのか（私はそうです） m意味がわからない）。

問題を攻撃するための最良の方法について、ここでいくつかの考えを探しています。同様のことを行う既存のフレームワークまたはライブラリがある場合は、それも喜んで使用します。

ありがとう。

score 5 · Accepted Answer

呼び出しを並列化したいのですが、ファイル全体をメモリに読み込む必要があるかどうかわかりません

ExecutorService境界のあるでを使用する必要がありますBlockingQueue。BlockingQueue100 万行を読み込んで、いっぱいになるまでスレッドプールにジョブを送信します。このようにして、事前にファイルのすべての行を読み取ることなく、100 (または最適な数) の HTTP 要求を同時に実行できます。

RejectedExecutionHandlerキューがいっぱいになった場合にブロックするを設定する必要があります。これは、呼び出し元がハンドラーを実行するよりも優れています。

BlockingQueue<Runnable> queue = new ArrayBlockingQueue<Runnable>(100);
// NOTE: you want the min and max thread numbers here to be the same value
ThreadPoolExecutor threadPool =
    new ThreadPoolExecutor(nThreads, nThreads, 0L, TimeUnit.MILLISECONDS, queue);
// we need our RejectedExecutionHandler to block if the queue is full
threadPool.setRejectedExecutionHandler(new RejectedExecutionHandler() {
       @Override
       public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
           try {
                // this will block the producer until there's room in the queue
                executor.getQueue().put(r);
           } catch (InterruptedException e) {
                throw new RejectedExecutionException(
                   "Unexpected InterruptedException", e);
           }
    }
});

// now read in the urls
while ((String url = urlReader.readLine()) != null) {
    // submit them to the thread-pool.  this may block.
    threadPool.submit(new DownloadUrlRunnable(url));
}
// after we submit we have to shutdown the pool
threadPool.shutdown();
// wait for them to complete
threadPool.awaitTermination(Long.MAX_VALUE, TimeUnit.MILLISECONDS);

...
private class DownloadUrlRunnable implements Runnable {
    private final String url;
    public DownloadUrlRunnable(String url) {
       this.url = url;
    }
    public void run() {
       // download the URL
    }
}

score 0 · Accepted Answer

グレイのアプローチは良いようです。私が提案するもう1つのアプローチは、ファイルをチャンクに分割し（ロジックを作成する必要があります）、複数のスレッドでそれらを処理することです。

java - JavaでHTTP呼び出し用の大きなファイルを処理する

2 に答える 2

Related

Reference