hadoop - Hive で認識されない hbase の数値型の値

Question

以下のように定義されたハイブ/hbase統合テーブルがあります。

create table user_c(user_id int, c_name string, c_kind string, c_industry string,
c_jobtitle string, c_workyear int, c_title string, c_company string)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry,cf1:c_jobtitle,cf1:c_workyear,cf1:c_title,cf1:c_company")
TBLPROPERTIES ("hbase.table.name" = "user_c");

私の Java コードでは、a を作成し、Putdb から読み取った値を入力します。コードは次のようになります。

final Put to = new Put(getByte(from, keyColumn));
for (final IColumn column : table.getColumns()) {
    if (column.equals(keyColumn)) continue;
    to.add(Bytes.toBytes(column.getColumnFamily()), Bytes.toBytes(column.getDestName()), getByte(from, column));
}
return to;

はgetBytevalue をに変換するメソッドbyte[]です。のように見えます

byte[] getByte(final Map<String, Object> map, IColumn column) {
    final Object val = map.get(column.getName());
    if (val instanceof Integer) {
        return Bytes.toBytes((Integer) val);
    }
    ...
}

次に、hbase に入れます。

hbase シェルからレコードをスキャンできます。

hbase(main):001:0> scan 'user_c'
ROW                                COLUMN+CELL                                                                                      
\x00\x0A\x07\x0D                  column=cf1:c_workyear, timestamp=1350298280554, value=\x00\x00\x07\xD8                         
\x00\x0A\x07\x0D                  column=cf1:c_industry, timestamp=1350298280554, value=120
...

行キーは型であり、メソッドによって処理されると、Integerプリミティブ型に自動的にアンボックスされる必要があります。行キーだけでなく、他の数値型の列(cf1:c_workyear)もバイト配列のように表示されます。intgetByte\x00\x0A\x07\x0D

その間、Stringタイプカラム (cf1:c_industry) には値が表示されます。

これでいいの？

そして、ハイブからレコードを照会するNULLと、数値型の列の値の代わりにが表示されます。

hive> select c_industry, c_workyear from user_c limit 1;
Total MapReduce CPU Time Spent: 10 seconds 370 msec
OK
120     NULL
Time taken: 46.063 seconds

c_workyear の値がハイブに認識されないようです。そのタイプが正しくないためだと思います。しかし、バイト配列ではなく、intバイト配列を値として格納するべきではありませんintか?

これを修正する方法を知っている人はいますか？

どうもありがとう。

score 5 · Accepted Answer

テーブル定義でこれを試してください

"hbase.columns.mapping" = ":key,cf1:c_name,cf1:c_kind,cf1:c_industry#b,cf1:c_jobtitle,cf1:c_workyear#b,cf1:c_title,cf1:c_company"

#bバイナリフィールドの後の使用に注意してください。私たちはこれをかなり前からうまく使っています

hadoop - Hive で認識されない hbase の数値型の値

2 に答える 2

Related

Reference