hadoop - map 関数 hadoop からのテキスト出力の書き込み

Question

入力:

a、b、c、d、e

q,w,34,r,e

1,2,3,4,e

マッパーでは、最後のフィールドのすべての値を取得し、 (e,(a,b,c,d)) を発行したい、つまり (key, (行の残りのフィールド)) を発行します。

助けていただければ幸いです。

現在のコード:

public static class Map extends Mapper<LongWritable, Text, Text, Text> {
   private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString(); // reads the input line by line   
        String[] attr = line.split(","); // extract each attribute values from the csv record
         context.write(attr[argno-1],line); // gives error seems to like only integer? how to override this?
        }
    }
 } 
 public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {

    public void reduce(Text key, Iterable<IntWritable> values, Context context) 
      throws IOException, InterruptedException {
        // further process , loads the chunk into 2d arraylist object for processing
    }
      public static void main(String[] args) throws Exception {
    String line; 
    String arguements[];
    Configuration conf = new Configuration();

        // compute the total number of attributes in the file
    FileReader infile = new FileReader(args[0]);
    BufferedReader bufread = new BufferedReader(infile);
    line = bufread.readLine();
    arguements = line.split(","); // split the fields separated by comma
    conf.setInt("argno", arguements.length); // saving that attribute value 
    Job job = new Job(conf, "nb");
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
        job.setMapperClass(Map.class); /* The method setMapperClass(Class<? extends Mapper>) in the type Job is not applicable for the arguments (Class<Map>) */
    job.setReducerClass(Reduce.class);
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    job.waitForCompletion(true);
 }`

エラーに注意してください (コメントを参照してください)。

score 4 · Accepted Answer

これは簡単です。最初に文字列を解析してキーを取得し、残りの行を値として渡します。次に、リストと同じキー値をすべて出力として結合する ID レデューサーを使用します。同じ形式である必要があります。

したがって、マップ関数は次のように出力します。

e, (a,b,c,d,e)

e, (q,w,34,r,e)

e, (1,2,3,4,e)

次に、identity reduce の後に次のように出力する必要があります。

e、{a、b、c、d、e; q、w、34、r、e; 1,2,3,4,e}

public static class Map extends Mapper<LongWritable, Text, Text, Text> {
   private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString(); // reads the input line by line   
        String[] attr = line.split(","); // extract each attribute values from the csv record
         context.write(attr[argno-1],line); // gives error seems to like only integer? how to override this?
        }
    }

    public static void main(String[] args) throws Exception {
        String line; 
        String arguements[];
        Configuration conf = new Configuration();

        // compute the total number of attributes in the file
        FileReader infile = new FileReader(args[0]);
        BufferedReader bufread = new BufferedReader(infile);
        line = bufread.readLine();
        arguements = line.split(","); // split the fields separated by comma
        conf.setInt("argno", arguements.length); // saving that attribute value 
        Job job = new Job(conf, "nb");
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(Map.class); 
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
 }

score -5 · Accepted Answer

-5

代替ロジックが見つかりました。実装、テスト、検証済み。

于 2013-01-09T21:47:12.050 に答える

hadoop - map 関数 hadoop からのテキスト出力の書き込み

2 に答える 2

Related

Reference