1

大きなテキスト ファイルを小さなテキスト ファイルに分割するためのマルチスレッド Java プログラムを開発しようとしています。作成される小さいファイルには、先頭に行数を付ける必要があります。例: 入力ファイルの行数が 100 で、入力数が 10 の場合、私のプログラムの結果は入力ファイルを 10 個のファイルに分割します。私はすでに自分のプログラムのシングルスレッド版を開発しました:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.PrintWriter;

public class TextFileSingleThreaded {

    public static void main(String[] args) {
        if (args.length != 2) {
            System.out.println("Invalid Input!");
        }

        //first argument is the file path
        File file = new File(args[0]);

        //second argument is the number of lines per chunk
        //In particular the smaller files will have numLinesPerChunk lines
        int numLinesPerChunk = Integer.parseInt(args[1]);

        BufferedReader reader = null;
        PrintWriter writer = null;
        try {
            reader = new BufferedReader(new FileReader(file));
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }

        String line;        

        long start = System.currentTimeMillis();

        try {
            line = reader.readLine();
            for (int i = 1; line != null; i++) {
                writer = new PrintWriter(new FileWriter(args[0] + "_part" + i + ".txt"));
                for (int j = 0; j < numLinesPerChunk && line != null; j++) {
                    writer.println(line);
                    line = reader.readLine();
                }
                writer.flush();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
        writer.close();

        long end = System.currentTimeMillis();

        System.out.println("Taken time[sec]:");
        System.out.println((end - start) / 1000);

    }

}

このプログラムのマルチスレッド バージョンを作成したいのですが、指定した行から始まるファイルを読み取る方法がわかりません。お願い助けて。:(

4

2 に答える 2

2

I want to write a multithreaded version of this program but I don't know how to read a file beginning from a specified line. Help me please. :(

I would not, as this implied, have each thread read from the beginning of the file ignoring lines until they come to their portion of the input file. This is highly inefficient. As you imply, the reader has to read all of the prior lines if the file is going to be divided up into chunks by lines. This means a whole bunch of duplicate read IO which will result in a much slower application.

You could instead have 1 reader and N writers. The reader will be adding the lines to be written to some sort of BlockingQueue per writer. The problem with this is that chances are you won't get any concurrency. Only one writer will most likely be working at one time while the rest of the writers wait for the reader to reach their part of the input file. Also, if the reader is faster than the writer (which is likely) then you could easily run out of memory queueing up all of the lines in memory if the file to be divided is large. You could use a size limited blocking queue which means the reader may block waiting for the writers but again, multiple writers will most likely not be running at the same time.

As mentioned in the comments, the most efficient way of doing this is single threaded because of these restrictions. If you are doing this as an exercise then it sounds like you will need to read the file through one time, note the start and end positions in the file for each of the output files and then fork the threads with those locations so they can re-read the file and write it into their separate output files in parallel without a lot of line buffering.

于 2013-07-29T15:53:45.637 に答える
0

ファイルを 1 回だけ読み取り、 List に保存する必要があります。

BufferedReader br = new BufferedReader(new FileReader(new File("yourfile")));
List<String> list = new ArrayList<String>();
String line;
//for each line of your file
while((line = br.readLine()) != null){
    list.add(line);
}
br.close();

//then you can split your list into differents parts
List<List<String>> parts  = new ArrayList<ArrayList<String>>();
for(int i = 0; i < 10; i++){
  parts.add(new ArrayList<String>());
  for(int j =0; j < 10; j++){
    parts.get(i).add(list.get(i*10+j));
  }
}
//now you have 10 lists which each contain 10 lines
//you still need to to create a thread pool, where each thread put a list into a file

スレッド プールの詳細については、このをお読みください。

于 2013-07-29T15:16:15.693 に答える