hadoop - マップは連鎖を減らす

Question

重複の可能性：
Hadoopマップでのデータ共有による連鎖の削減

下記のようにマップリデュースチェーンを持っています。

Job1（Map1-> Reduce 1）-> Job2（Map2、Reduce2）Job1.waitForCompletion（true）

Map2に値（Reduce1によって作成されたintaを想定）が必要です。

これどうやってするの？？あなたの考えを共有してください

score 1 · Accepted Answer

ChainMapper と ChainReducer を使用できます。ここにあなたの助けのためのサンプルコードがあります.

Configuration conf = getConf();
JobConf job = new JobConf(conf);
JobConf Conf1 = new JobConf(false);
ChainMapper.setMapper
(job,
Map1.class,
LongWritable.class,
Text.class,
Text.class,
Text.class,
true,
Conf1);


 JobConf Conf2 = new JobConf(false);
    ChainReducer.setReducer
    (job,
    Reduce1.class,
    Text.class,
    Text.class,
    Text.class,
    Text.class,
    true,
    Conf2);
JobConf Conf3 = new JobConf(false);
    ChainMapper.setMapper
    (job,
    Map2.class,
    Text.class,
    Text.class,
    Text.class,
    Text.class,
    true,
    Conf3);
JobConf Conf4 = new JobConf(false);
    ChainReducer.setReducer
    (job,
    Reduce2.class,
    Text.class,
    Text.class,
    Text.class,
    Text.class,
    true,
    Conf4);

ノート：

the out-put Type of  key-value derive which Mapper and reducer is to be called next so , the output Type of Map1 should me same as Input Type of key-value of Reduce1 AND the output Type of Reduce1 should me same as Input Type of key-value of Map2 and 
the output Type of Map2 should me same as Input Type of key-value of Reduce2

score 0 · Accepted Answer

----------

別の答え

Reduce1 からの出力を flatfile(hdfs) に保存し、2 番目のジョブで driver(Job) を設定しながらそのファイルを読み取ります。次に、変数をコンテキストに設定します。

//read reducer output from file . and set it @name variable 
Configuration conf = getConf();
Job job = new JobConf(conf);
conf.setInt("name", 0000);

マッパー(map2)で

    mapper()
{
int value;

        @Override
        public void configure(JobConf job) {

            value=job.getInt("name", 0);
        }

        @Override
        public void map(Text key, Text value,
                OutputCollector<LongWritable, Text> output, Reporter arg3)
                throws IOException {


        }
}

score 0 · Accepted Answer

問題を解決するためのカウンター

Job1,Reduce1 のカウンターを使用して、Job1 から値を取得し、それを Job2 に渡すことができます。コーディングが必要なフローのサンプルコードを次に示します。

1.カウンタを使用して値を設定するサンプルコード

    Reducer()
{
public static enum COUNTER {
  INTVALUE
};

Reduce()
{
// Old API
reporter.incrCounter(COUNTER .INTVALUE, 1);

//NEW API
context.getCounter(COUNTER .INTVALUE).increment(1);

}

}

2.ジョブ1から設定されたカウンターを取得し、それをジョブ2のJonConfに設定します。マッパーは同じ値を取得できます。

main()
{
// .....
jobclient1.submit(job1);
RunningJob job = JobClient1.runJob(conf);  // blocks until job completes
Counters c = job.getCounters();
int value= c.getCounter(COUNTER .INTVALUE);

// Now set the value in Job2 
Job job2 = new JobConf(conf);
job2.setInt("name", value);
}

3.Map2 Job1 カウンターから値を取得 -> Jobconf2

    mapper()
 {
    int value;

    @Override
    public void configure(JobConf job) {

        value=job.getInt("name", 0);
    }

    @Override
    public void map(Text key, Text value,
            OutputCollector<LongWritable, Text> output, Reporter arg3)
            throws IOException {


    }
}

hadoop - マップは連鎖を減らす

3 に答える 3

別の答え

問題を解決するためのカウンター

Related

Reference