hadoop - Hadoop の Combiners 、 Reducers 、 EcoSystemProject

Question

このサイトで言及されている質問 4 の答えはどうなると思いますか?

答えは正しいか間違っているか

質問: 4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.

Answer:A

と

質問: 3

What happens in a MapReduce job when you set the number of reducers to one?

A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A

上記の質問に対する私の理解による回答から

Question 4: D
Question 3: B

アップデート

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B

更新された質問について、私の答えはBとCで疑わしいです

編集

正解：スクープ。

score 5 · Accepted Answer

私の理解では、両方の答えが間違っています。

私はであまり作業していませんCombinerが、どこでもの出力で作業していることがわかりましたMapper。質問 4の答えはDである必要があります。

繰り返しになりますが、実際の経験から、出力ファイルの数は常にの数に等しいことがわかりましたReducer。したがって、質問 3の答えはBになります。これは使用時には当てはまらないかもしれませんが、MultipleOutputsそれは一般的ではありません。

最後に、Apache は MapReduce について嘘をつかないと思います (例外は発生します :)。両方の質問に対する答えは、wiki ページで入手できます。見てください。

ちなみに、「100% パス保証または返金!!!」が気に入りました。 あなたが提供したリンクの引用;-)

編集
Pig & Sqoop に関する知識がほとんどないため、更新セクションの質問についてはわかりません。ただし、HDFS データに外部テーブルを作成してから結合することで、Hive を使用して同じことを実現できます。

UPDATE
ユーザーmilk3422と所有者からのコメントの後、別のOLTPデータベースが関係しているため、Hiveが最後の質問に対する答えであるという私の仮定が間違っていることがわかりました。Sqoop は HDFS とリレーショナルデータベース間でデータを転送するように設計されているため、適切な答えはCです。

score 0 · Accepted Answer

質問 4 と 3 の答えは、私には正しいように思えます。質問4の場合、コンバイナーを使用している間、マップ出力がコレクションnに保持され、最初に処理され、バッファがいっぱいになるとフラッシュされます。これを正当化するために、次のリンクを追加します: http://wiki.apache.org/hadoop/HadoopMapReduce

ここでは、コンバイナーがプロセスを高速化する理由を明確に示しています。

また、q.3の回答も一般的に正しいと思います。これは基本的な構成であり、その後にデフォルトが続きます。別の有益なリンクを追加することを正当化するために: https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types

hadoop - Hadoop の Combiners 、 Reducers 、 EcoSystemProject

2 に答える 2

Related

Reference