java - 異なる長さの部分文字列を生成するための hadoop mapreduce

Question

Hadoop mapreduce を使用して、さまざまな長さの部分文字列を取得するコードを書いています。文字列「ZYXCBA」と長さ 3 を指定した例 (テキストファイルを使用して、「3 ZYXCBA」として入力します)。私のコードは、長さ 3 ("ZYX","YXC","XCB","CBA")、長さ 4("ZYXC","YXCB","XCBA")、最後に長さ 5("ZYXCB") のすべての可能な文字列を返す必要があります。 "、"YXCBA")。

マップフェーズでは、次のことを行いました。

キー = 必要な部分文字列の長さ

値 = "ZYXCBA"。

したがって、マッパーの出力は

3,"ZYXCBA"
4,"ZYXCBA"
5,"ZYXCBA"

reduce では、文字列 ("ZYXCBA") とキー 3 を使用して、長さ 3 のすべての部分文字列を取得します。4,5 についても同じことが起こります。結果は文字列を使用して連結されます。したがって、reduce の出力は次のようになります。

3 "ZYX YXC XCB CBA"
4 "ZYXC YXCB XCBA"
5 "ZYXCB YXCBA"

次のコマンドを使用してコードを実行しています。

hduser@Ganesh:~/Documents$ hadoop jar Saishingles.jar hadoopshingles.Saishingles Behara/Shingles/input Behara/Shingles/output

私のコードは次のとおりです。

package hadoopshingles;

import java.io.IOException;
//import java.util.ArrayList;

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;


public class Saishingles{

public static class shinglesmapper extends Mapper<Object, Text, IntWritable, Text>{

        public void map(Object key, Text value, Context context
                ) throws IOException, InterruptedException {

            String str = new String(value.toString());
            String[] list = str.split(" ");
            int x = Integer.parseInt(list[0]);
            String val = list[1];
            int M = val.length();
            int X = M-1;


            for(int z = x; z <= X; z++)
            {
                context.write(new IntWritable(z), new Text(val));
            }

        }

     }


public static class shinglesreducer extends Reducer<IntWritable,Text,IntWritable,Text> {


    public void reduce(IntWritable key, Text value, Context context
            ) throws IOException, InterruptedException {
        int z = key.get();
        String str = new String(value.toString());
        int M = str.length();
        int Tz = M - z;
        String newvalue = "";
        for(int position = 0; position <= Tz; position++)
        {
            newvalue = newvalue + " " + str.substring(position,position + z);   
        }

        context.write(new IntWritable(z),new Text(newvalue));
    }
}




public static void main(String[] args) throws Exception {
      GenericOptionsParser parser = new GenericOptionsParser(args);
      Configuration conf = parser.getConfiguration();
      String[] otherArgs = parser.getRemainingArgs();

        if (otherArgs.length != 2) 
        {
          System.err.println("Usage: Saishingles <inputFile> <outputDir>");
          System.exit(2);
        }
      Job job = Job.getInstance(conf, "Saishingles");
      job.setJarByClass(hadoopshingles.Saishingles.class);
      job.setMapperClass(shinglesmapper.class);
      //job.setCombinerClass(shinglesreducer.class);
      job.setReducerClass(shinglesreducer.class);
      //job.setMapOutputKeyClass(IntWritable.class);
      //job.setMapOutputValueClass(Text.class);
      job.setOutputKeyClass(IntWritable.class);
      job.setOutputValueClass(Text.class);
      FileInputFormat.addInputPath(job, new Path(args[0]));
      FileOutputFormat.setOutputPath(job, new Path(args[1]));
      System.exit(job.waitForCompletion(true) ? 0 : 1);

}

}

return の代わりに reduce の出力

3 "ZYX YXC XCB CBA"
4 "ZYXC YXCB XCBA"
5 "ZYXCB YXCBA"

戻ってきた

3 "ZYXCBA"
4 "ZYXCBA"
5 "ZYXCBA"

つまり、mapper と同じ出力が得られます。なぜこれが起こっているのか分かりません。これを解決するのを手伝ってください。助けてくれてありがとう;) :) :)

java - 異なる長さの部分文字列を生成するための hadoop mapreduce

1 に答える 1

Related

Reference