cassandra - 時系列データ、cassandra の maxTimeuuid/minTimeuuid で範囲を選択

Question

最近、cassandra でキースペースと列ファミリーを作成しました。私は次のものを持っています

CREATE TABLE reports (
  id timeuuid PRIMARY KEY,
  report varchar
)

時間の範囲でレポートを選択したい。私のクエリは次のとおりです。

select dateOf(id), id 
from keyspace.reports 
where token(id) > token(maxTimeuuid('2013-07-16 16:10:48+0300'));

戻ります。

dateOf(id)                | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

だから、それは間違っています。

次のcqlを使用しようとすると;

select dateOf(id), id from keyspace.reports 
where token(id) > token(minTimeuuid('2013-07-16 16:12:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b3f6d00-ee19-11e2-8734-8d331d938752
 2013-07-16 16:10:13+0300 | 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

select dateOf(id), id from keyspace.reports
where token(id) > token(minTimeuuid('2013-07-16 16:13:48+0300'));

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 16:10:37+0300 | 1b275870-ee19-11e2-b3f3-af3e3057c60f
 2013-07-16 16:10:48+0300 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

ランダムですか？意味のある出力が得られないのはなぜですか?

cassandra でこれに最適なソリューションは何ですか?

score 3 · Accepted Answer

token 関数を使用していますが、これはコンテキスト (mintimeuuid と maxtimeuuid を使用して時間の間でクエリを行う) では実際には役に立たず、ランダムに見える誤った出力を生成しています。

CQLドキュメントから：

TOKEN 関数は、クエリを実行するパーティションキー列の条件演算子と共に使用できます。クエリは、値ではなくパーティションキーのトークンに基づいて行を選択します。キーのトークンは、使用中のパーティショナーによって異なります。RandomPartitioner と Murmur3Partitioner は意味のある順序を生成しません。

2 つの日付間のすべてのレコードに基づいて取得しようとしている場合は、データを行ごとに 1 つのレコードではなく、列ごとに 1 つのレコードを持つ幅の広い行としてモデル化する方が理にかなっています。たとえば、テーブルを作成します。

CREATE TABLE reports (
  reportname text,
  id timeuuid,
  report text,
  PRIMARY KEY (reportname, id)
)

、データの入力:

insert into reports2(reportname,id,report) VALUES ('report', 1b3f6d00-ee19-11e2-8734-8d331d938752, 'a');
insert into reports2(reportname,id,report) VALUES ('report', 0d4b20e0-ee19-11e2-bbb3-e3eef18ad51b, 'b');
insert into reports2(reportname,id,report) VALUES ('report', 1b275870-ee19-11e2-b3f3-af3e3057c60f, 'c');
insert into reports2(reportname,id,report) VALUES ('report', 21f9a390-ee19-11e2-89a2-97143e6cae9e, 'd');

、およびクエリ (トークン呼び出しなし!):

select dateOf(id),id from reports2 where reportname='report' and id>maxtimeuuid('2013-07-16 16:10:48+0300');

、期待される結果を返します。

 dateOf(id)               | id
--------------------------+--------------------------------------
 2013-07-16 14:10:48+0100 | 21f9a390-ee19-11e2-89a2-97143e6cae9e

これの欠点は、すべてのレポートが 1 つの行にあることです。もちろん、多数の異なるレポートを保存できるようになりました (ここではレポート名でキーを設定)。2013 年 8 月に呼び出されたすべてのレポートを取得するにはmynewreport、次を使用してクエリを実行できます。

select dateOf(id),id from reports2 where reportname='mynewreport' and id>=mintimeuuid('2013-08-01+0300') and id<mintimeuuid('2013-09-01+0300');

cassandra - 時系列データ、cassandra の maxTimeuuid/minTimeuuid で範囲を選択

1 に答える 1

Related

Reference