python - Pythonを使用してmongodbのパイプラインでdistinctを使用する方法

Question

私はこのようなデータを持っています

{ "_id": "1234gbrghr",
  "Device" : "samsung",
  "UserId" : "12654",
  "Month" : "july"
},

{ "_id": "1278gbrghr",
  "Device" : "nokia",
  "UserId" : "87654",
  "Month" : "july"
},

{ "_id": "1234gbrghr",
  "Device" : "samsung",
  "UserId" : "12654",
  "Month" : "july"
}

7 月に特定のデバイスの個別のユーザー数を取得する必要があります。たとえば、「ユーザー (UserId) が 7 月に samsung デバイスを 2 回以上使用した場合、samsung の 1 つとしてカウントされます。

このために、このクエリを使用して、7 月のユーザーの総数を取得しました。しかし、私は明確なユーザー数を取得する必要があります

pipeline1 = [

            {'$match':{'Month':'july'}},
            {'$group':{'_id' : '$Device', 'count' : { '$sum' : 1 }}}
          ]
    data = db.command('aggregate', 'collection', pipeline=pipeline1);

score 1 · Accepted Answer

代わりに、最初にデバイスとユーザーでグループ化する必要があります。次のパイプライン演算子を使用してこれを行うことができます。

{'$group':{'_id' : { d: '$Device', u: '$UserId' } } }

次に、ユーザーごとのデバイス数をカウントする必要があります (既に行ったように、わずかに変更しました:

{ '$group': { '_id' : '$_id.d', 'count': { '$sum' : 1 } } }

次のデータセットを使用します。

{ "_id" : "1234gbrghr", "Device" : "samsung", "UserId" : "12654", "Month" : "july" }
{ "_id" : "1278gbrghr", "Device" : "nokia", "UserId" : "87654", "Month" : "july" }
{ "_id" : "1239gbrghr", "Device" : "samsung", "UserId" : "12654", "Month" : "july" }
{ "_id" : "1238gbrghr", "Device" : "samsung", "UserId" : "12653", "Month" : "july" }

そして、次の集約コマンド:

db.so.aggregate( [
    { '$match' : {'Month' : 'july' } },
    { '$group' : {
        '_id' : { d: '$Device', u: '$UserId' },
        'count' : { '$sum' : 1 }
    } }, 
    { '$group': {
        '_id' : '$_id.d',
        'count': { '$sum' : 1 }
    } }
] );

これは以下を出力します:

{
    "result" : [
        {
            "_id" : "nokia",
            "count" : 1
        },
        {
            "_id" : "samsung",
            "count" : 2
        }
    ],
    "ok" : 1
}

python - Pythonを使用してmongodbのパイプラインでdistinctを使用する方法

1 に答える 1

Related

Reference