sorting - 追加の値を生成する MapReduce での並べ替え

Question

次の順序で一連の整数をソートしようとしています。

A    2
B    9
C    4
....
....
Z    42

以下は Mapper と Reducer のコードです。

public static class MapClass extends MapReduceBase implements Mapper<Text, Text, IntWritable, Text>
    {
        public void map(Text key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException
        {
            output.collect(new IntWritable(Integer.parseInt(value.toString())), key);
        }
    }

    public static class Reduce extends MapReduceBase implements Reducer<IntWritable, Text, IntWritable, Text>
    {
        public void reduce(IntWritable key, Iterator<Text> values, OutputCollector<IntWritable, Text> output, Reporter reporter) throws IOException
        {
            output.collect(key, new Text(""));
        }
    }

しかし、出力は多くの余分な整数を生成しています。コードの何が問題なのか誰か教えてもらえますか?

また、可能であれば、MapReduce を使用した整数ソートの良い例を教えてください。

編集：

job.setInputFormat(KeyValueTextInputFormat.class);
job.setOutputFormat(TextOutputFormat.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);

score 0 · Accepted Answer

私はあなたの論理に従って試しましたが、新しい APIS を使用しました。結果は正しいです。

注: reduce(...) 関数の 2 番目のパラメーターは**Iterable**<Text>

package stackoverflow;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class q18076708 extends Configured implements Tool {
    static class MapClass extends Mapper<Text, Text, IntWritable, Text> {
        public void map(Text key, Text value, Context context)
                throws IOException, InterruptedException {
            context.write(new IntWritable(Integer.parseInt(value.toString())),
                    key);
        }

    }

    static class Reduce extends Reducer<IntWritable, Text, IntWritable, Text> {
        static int xxx = -1;
        @Override
        public void reduce(IntWritable key, **Iterable**<Text> values,
                Context context) throws IOException, InterruptedException {
            context.write(key, new Text(""));
        }

    }

    public int run(String[] args) throws Exception {

        getConf().set("fs.default.name", "file:///");
        getConf().set("mapred.job.tracker", "local");
        Job job = new Job(getConf(), "Logging job");
        job.setJarByClass(getClass());

        FileInputFormat.addInputPath(job, new Path("src/test/resources/testinput.txt"));
        FileSystem.get(getConf()).delete(new Path("target/out"), true);
        FileOutputFormat.setOutputPath(job, new Path("target/out"));

        job.setMapperClass(MapClass.class);
        job.setMapOutputKeyClass(IntWritable.class);
        job.setMapOutputValueClass(Text.class);

        job.setCombinerClass(Reduce.class);
        job.setReducerClass(Reduce.class);

        job.setInputFormatClass(KeyValueTextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(Text.class);

        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {

        int exitCode = ToolRunner.run(new q18076708(), args);
        System.exit(exitCode);
    }
}

入力：

出力:

sorting - 追加の値を生成する MapReduce での並べ替え

1 に答える 1

Related

Reference