mongodb - mongodb 集約フレームワークで case-statement を実行する

Question

私たちは現在 SQL Server 上で実行しているため、MongoDB 集約フレームワークが私たちのニーズにどの程度適合しているかを評価しています。特定のクエリを実行するのに苦労しています:

次の疑似レコードがあるとします (SQL テーブルの列としてモデル化され、mongodb コレクションの完全なドキュメントとしてモデル化されます)。

{
   name: 'A',
   timespent: 100,
},
{
   name: 'B',
   timespent: 200,
},
{
   name: 'C',
   timespent: 300,
},
{
   name: 'D',
   timespent: 400,
},
{
   name: 'E',
   timespent: 500,
}

timespent フィールドを範囲にグループ化し、発生をカウントして、たとえば次の疑似レコードを取得します。

results{
   0-250: 2,
   250-450: 2,
   450-650: 1
}

これらの範囲 (250、450、および 650) は動的であり、ユーザーによって時間の経過とともに変更される可能性があることに注意してください。SQL では、次のような結果を抽出しました。

select range, COUNT(*) as total from (
select case when Timespent <= 250 then '0-250'
when Timespent <= 450 then '200-450'
else '450-600' end as range
from TestTable) as r
group by r.range

繰り返しますが、この sql はアプリによって動的に構築され、一度に利用可能な特定の範囲に適合することに注意してください。

そのようなクエリを実行するために、mongodb 集約フレームワークで適切な構造を見つけるのに苦労しています。パイプラインに $match を挿入することで単一の範囲の結果を照会できます (つまり、単一の範囲の結果を取得します) が、単一のパイプラインクエリですべての範囲とそのカウントを抽出する方法を理解できません。

score 33 · Accepted Answer

集計フレームワークの "case" SQL ステートメントに対応するのは $cond 演算子です (マニュアルを参照)。$cond ステートメントをネストして、"when-then" と "else" をシミュレートできますが、別のアプローチを選択しました。これは、読みやすい (および生成する、以下を参照) ためです。 $concat 演算子を使用して書き込みます。グループ化キーとして機能する範囲文字列。

したがって、指定されたコレクションの場合:

db.xx.find()
{ "_id" : ObjectId("514919fb23700b41723f94dc"), "name" : "A", "timespent" : 100 }
{ "_id" : ObjectId("514919fb23700b41723f94dd"), "name" : "B", "timespent" : 200 }
{ "_id" : ObjectId("514919fb23700b41723f94de"), "name" : "C", "timespent" : 300 }
{ "_id" : ObjectId("514919fb23700b41723f94df"), "name" : "D", "timespent" : 400 }
{ "_id" : ObjectId("514919fb23700b41723f94e0"), "name" : "E", "timespent" : 500 }

集計 (ハードコード) は次のようになります。

db.xx.aggregate([
  { $project: {
    "_id": 0,
    "range": {
      $concat: [{
        $cond: [ { $lte: ["$timespent", 250] }, "range 0-250", "" ]
      }, {
        $cond: [ { $and: [
          { $gte: ["$timespent", 251] }, 
          { $lt:  ["$timespent", 450] } 
        ] }, "range 251-450", "" ]
      }, {
        $cond: [ { $and: [
          { $gte: ["$timespent", 451] }, 
          { $lt:  ["$timespent", 650] } 
        ] }, "range 450-650", "" ]
      }]
    }
  }},
  { $group: { _id: "$range", count: { $sum: 1 } } },
  { $sort: { "_id": 1 } },
]);

結果は次のとおりです。

{
    "result" : [
        {
            "_id" : "range 0-250",
            "count" : 2
        },
        {
            "_id" : "range 251-450",
            "count" : 2
        },
        {
            "_id" : "range 450-650",
            "count" : 1
        }
    ],
    "ok" : 1
}

集計コマンドを生成するには、JSON オブジェクトとして "range" プロジェクションを作成する必要があります (または、文字列を生成してから JSON.parse(string) を使用できます)。

ジェネレーターは次のようになります。

var ranges = [ 0, 250, 450, 650 ];
var rangeProj = {
  "$concat": []
};

for (i = 1; i < ranges.length; i++) {
  rangeProj.$concat.push({
    $cond: {
      if: {
        $and: [{
          $gte: [ "$timespent", ranges[i-1] ]
        }, {
          $lt: [ "$timespent", ranges[i] ]
        }]
      },
      then: "range " + ranges[i-1] + "-" + ranges[i],
      else: ""
    }
  })
}

db.xx.aggregate([{
  $project: { "_id": 0, "range": rangeProj }
}, {
  $group: { _id: "$range", count: { $sum: 1 } }
}, {
  $sort: { "_id": 1 }
}]);

上記と同じ結果が返されます。

score 9 · Accepted Answer

MongoDB 3.4 以降では、演算子を使用して、ステージ$switchでマルチスイッチステートメントを実行できます。$project

パイプライン演算子はドキュメントを「範囲」でグループ化し、アキュムレータ演算子$groupを使用して各グループの「カウント」を返します。$sum

db.collection.aggregate(
    [  
        { "$project": { 
            "range": { 
                "$switch": { 
                    "branches": [ 
                        { 
                            "case": { "$lte": [ "$timespent", 250 ] }, 
                            "then": "0-250" 
                        }, 
                        { 
                            "case": { 
                                "$and": [ 
                                    { "$gt": [ "$timespent", 250 ] }, 
                                    { "$lte": [ "$timespent", 450 ] } 
                                ] 
                            }, 
                            "then": "251-450" 
                        }, 
                        { 
                            "case": { 
                                "$and": [ 
                                    { "$gt": [ "$timespent", 450 ] }, 
                                    { "$lte": [ "$timespent", 650 ] } 
                                ] 
                            }, 
                            "then": "451-650" 
                        } 
                    ], 
                    "default": "650+" 
                } 
            } 
        }}, 
        { "$group": { 
            "_id": "$range", 
            "count": { "$sum": 1 } 
        }}
    ]
)

私たちのコレクションにある次のドキュメントでは、

{ "_id" : ObjectId("514919fb23700b41723f94dc"), "name" : "A", "timespent" : 100 },
{ "_id" : ObjectId("514919fb23700b41723f94dd"), "name" : "B", "timespent" : 200 },
{ "_id" : ObjectId("514919fb23700b41723f94de"), "name" : "C", "timespent" : 300 },
{ "_id" : ObjectId("514919fb23700b41723f94df"), "name" : "D", "timespent" : 400 },
{ "_id" : ObjectId("514919fb23700b41723f94e0"), "name" : "E", "timespent" : 500 }

私たちのクエリは次のようになります。

{ "_id" : "451-650", "count" : 1 }
{ "_id" : "251-450", "count" : 2 }
{ "_id" : "0-250", "count" : 2 }

ドキュメントを範囲でソートするステージをパイプラインに追加したい場合がありますが、これは「範囲」のタイプのため、ドキュメントを辞書順 $sortでのみソートします。

mongodb - mongodb 集約フレームワークで case-statement を実行する

2 に答える 2

Related

Reference