hadoop - hive unix_timestamp() 複数の値を与える UDF

Question

現在の時刻を含む余分な行を追加しながら、HQL を使用してハイブテーブルからいくつかのデータを抽出しています。

次のようなもの: myTable から col1、col2、col3、unix_timestamp() を選択します。

すべてのレコードが 4 列目に同じ値を持つことを期待していました。

私は次のようなものを期待していました：

col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT
col1Value, col2Value, col3Value, col4Value, timeT

しかし、私はこのようなものを得ています:

col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT1
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT2
col1Value, col2Value, col3Value, col4Value, timeT3
col1Value, col2Value, col3Value, col4Value, timeT3

データセットはそれほど大きくなく、単一のマッパーのみが使用されます。だから私の質問は：

単一のマシンで、選択されたすべての行 (ハイブのマッパーの各行) に対して unix_timestamp() が評価されますか?それとも 1 つの値が評価され、すべての行に使用されますか?

MapR M5/hive 0.9.0 を使用しています

score 1 · Accepted Answer

LanguageManualによると、「UDF の evaluate メソッドのコンテキストは、一度に 1 行ずつです」。unix_timestamp()これは、発行されたレコードごとに 1 回、マッピングフェーズ中に呼び出しが評価されることを意味すると思います。

おそらく、サブクエリを使用してunix_timestamp()一度評価し、その結果を元のクエリに結合できますか?

hadoop - hive unix_timestamp() 複数の値を与える UDF

1 に答える 1

Related

Reference