hadoop - hadoop+mapreduceを使用してcouchdbでデータを処理する

Question

CouchDBには非常に大量のデータがありますが、ごく最近、couchのmapreduce関数がいかに機能しなくなっているか（連鎖なし）を発見しました。

それで、Hadoopを使用してCouchDBデータベースからのマップリデュースクエリを実行し、最終結果を別のCouchDBデータベースに保存するというこのアイデアがありましたか？

これはクレイジーすぎますか？これを行うようにHbaseをセットアップできることはわかっていますが、データをCouchDBからHbaseに移行したくありません。そして、私はデータストアとしてソファが大好きです。

score 0 · Accepted Answer

The MapReduce functions in CouchDB are constrained to simplify caching of the results. Rather than having to search for views that are impacted by a change, views were designed to be self-contained.

This means that if you have complex MapReduce code, you can use a tool like CouchApp to embed functions within a MapReduce function. I'm having trouble finding the reference for this, but you the macro !code to embed JavaScript functions in views. Using require() or // !json, !code in CouchDB?

This could help to get some of the productivity benefit of chaining without chaining, by putting most of the code in shared functions, and merely calling the function in the different views. For the performance benefit of chaining, if that's what you're after, you may be better off just moving to HBase.

score 0 · Accepted Answer

どうやら CouchDB はSqoop 経由で Hadoopにデータをストリーミングできるはずですが、そのリンク以外の情報は見当たりませんでした。最悪の場合、独自の入力リーダーを作成して CouchDB から読み取るか、データを定期的にエクスポートして HDFS にスローし、そこから実行することができます。

hadoop - hadoop+mapreduceを使用してcouchdbでデータを処理する

2 に答える 2

Related

Reference