これを行う別の方法があるかもしれませんが、そうです。Drill でクエリを実行するために、データを少し変換する必要があるかもしれません。
これは、KVGEN を使用したい状況のようです。KVGEN は、Chris Matta が説明している種類の列を提供しますが、KVGEN は列で動作します。この場合、実際に使用する列はありません。
0: jdbc:drill:zk=local> select t.* from dfs.`/Users/vince/data/stackoverflow/users.json` t;
+---+---+
| 3 | 4 |
+---+---+
| {"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""} | {"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":""} |
+---+---+
1 row selected (0.133 seconds)
これらの列は動的かつ JSON オブジェクトの「最上位」にあるため、ここでは KVGEN を使用できません。しかし、データを少し変換すると、KVGEN を使用できます。この最も優れたツール jq の呼び出しを使用して、データを KVGEN が使用できる形式に変換しました。
$ jq '.| { "user": . }' < users.json > users_kv.json
これは入力を受け取り、JSON オブジェクトを別のマップにラップするだけです。これにより、次のことを行う必要がある「静的」列が得られます。
0: jdbc:drill:zk=local> select kvgen(t.`user`) from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t;
+--------+
| EXPR$0 |
+--------+
| [{"key":"3","value":{"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"},"-JxAQpWGzY-gOzej7Xis":{}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""}},{"key":"4","value":{"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-Jx6fpaJHvKPHc8CylPd":{},"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{},"-JxWuZ-ePcx0XqYRmzc6":{}}}}] |
+--------+
1 row selected (1.774 seconds)
列内にリストがあるため、希望する方法で実際にクエリを実行することはできません。したがって、FLATTEN を使用します。
0: jdbc:drill:zk=local> select flatten(kvgen(t.`user`)) as `user` from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t;
+------+
| user |
+------+
| {"key":"3","value":{"company":"","d_year":"","email":"mario.giambanco@domain.com","facebook":"","fullname":"Mario Test","google":"","igoto":"","image":"","notifications":{"-Jx6fpaJHvKPHc8CylPd":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"3","posteddate":1440016723546,"type":"system"},"-JxAQpWGzY-gOzej7Xis":{}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{"title":"Anesthesia"},"-JxWuZ-ePcx0XqYRmzc6":{"title":"Bridges"}},"twitter":""}} |
| {"key":"4","value":{"company":"","d_year":"","email":"mariogiambanco@domain.com","fullname":"mario test","igoto":"","image":"img/a0.jpg","notifications":{"-Jx6fpaJHvKPHc8CylPd":{},"-JxAQpWGzY-gOzej7Xis":{"from":"System","image":"/img/system_icon.jpg","msg":"System:","param":"4","posteddate":1440079641420,"type":"system"}},"school":"","school_year":"","tags":{"-JxWuEPs183UEwsI-XNb":{},"-JxWuZ-ePcx0XqYRmzc6":{}}}} |
+------+
2 rows selected (0.257 seconds)
2 行 - はるかに優れています。これで、最初からやりたかったことを実行する準備が整いました (サブクエリと、予約語 user と value の周りのバッククォートにも注意してください:
0: jdbc:drill:zk=local> select u.`user`.`key` as userid, u.`user`.`value`.fullname as fullname, u.`user`.`value`.email as email from (select flatten(kvgen(t.`user`)) as `user` from dfs.`/Users/vince/data/stackoverflow/users_kv.json` t) u where u.`user`.`value`.fullname = 'Mario Test';
+---------+-------------+-----------------------------+
| userid | fullname | email |
+---------+-------------+-----------------------------+
| 3 | Mario Test | mario.giambanco@domain.com |
+---------+-------------+-----------------------------+
1 row selected (0.22 seconds)