hadoop - 分散キャッシュ Hadoop がファイルコンテンツを取得しない

Question

分散キャッシュとして使用したいファイルからのデータではなく、ガベージのような値を取得しています。

ジョブ構成は次のとおりです。

Configuration config5 = new Configuration();
JobConf conf5 = new JobConf(config5, Job5.class);
conf5.setJobName("Job5");
conf5.setOutputKeyClass(Text.class);
conf5.setOutputValueClass(Text.class);
conf5.setMapperClass(MapThree4c.class);
conf5.setReducerClass(ReduceThree5.class);
conf5.setInputFormat(TextInputFormat.class);
conf5.setOutputFormat(TextOutputFormat.class);


DistributedCache.addCacheFile(new URI("/home/users/mlakshm/ap1228"), conf5);
FileInputFormat.setInputPaths(conf5, new Path(other_args.get(5)));
FileOutputFormat.setOutputPath(conf5, new Path(other_args.get(6)));

JobClient.runJob(conf5);

マッパーには、次のコードがあります。

public class MapThree4c extends MapReduceBase implements Mapper<LongWritable, Text, 
Text, Text >{
private Set<String> prefixCandidates = new HashSet<String>();

Text a = new Text();
public void configure(JobConf conf5) {

Path[] dates = new Path[0];
try {
        dates = DistributedCache.getLocalCacheFiles(conf5);
        System.out.println("candidates: "+candidates);
        String astr = dates.toString();
        a = new Text(astr);

      } catch (IOException ioe) {
        System.err.println("Caught exception while getting cached files: " +   
      StringUtils.stringifyException(ioe));
      }


  }




   public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, 
   Reporter reporter) throws IOException {

     String line = value.toString();
     StringTokenizer st = new StringTokenizer(line);
     st.nextToken();
     String t = st.nextToken();
     String uidi = st.nextToken();
     String uidj = st.nextToken();

     String check = null;

     output.collect(new Text(line), a);



        }


    }

このマッパーから取得している出力値は、[Lorg.apache.hadoop.fs.Path;@786c1a82]
であり、分散キャッシュファイルの値ではありません。

score 1 · Accepted Answer

これは、配列でtoString（）を呼び出したときに得られるもののように見えます。また、DistributedCache.getLocalCacheFiles（）のjavadocsを見ると、それが返されます。キャッシュ内のファイルの内容を実際に読み取る必要がある場合は、標準のJavaAPIを使用してファイルを開いたり読み取ったりできます。

score 0 · Accepted Answer

あなたのコードから：

Path[] dates = DistributedCache.getLocalCacheFiles(conf5);

ことを意味します：

String astr = dates.toString();// は、[Lorg.apache.hadoop.fs.Path;@786c1a82.

実際のパスを表示するには、次の手順を実行する必要があります。

for(Path cacheFile: dates){

    output.collect(new Text(line), new Text(cacheFile.getName()));

}

hadoop - 分散キャッシュ Hadoop がファイル コンテンツを取得しない

2 に答える 2

Related

Reference

hadoop - 分散キャッシュ Hadoop がファイルコンテンツを取得しない