mongodb - MongoDB 2dsphere インデックス $geoWithin パフォーマンス

Question

GeoJSON ポイント形式の座標データを含むコレクションがあり、そこからエリア内の最新の 10 エントリをクエリする必要があります。現在 1.000.000 のエントリがありますが、約 10 倍になります。

私の問題は、目的の領域内に多数のエントリがある場合、クエリのパフォーマンスが大幅に低下することです (ケース 3)。私が現在持っているテストデータはランダムですが、実際のデータはランダムではないため、純粋に領域の次元に基づいて別のインデックス (ケース 4 のように) を選択することはできません。

地域に関係なく、予測どおりに実行するにはどうすればよいですか?

1. コレクション統計:

> db.randomcoordinates.stats()
{
    "ns" : "test.randomcoordinates",
    "count" : 1000000,
    "size" : 224000000,
    "avgObjSize" : 224,
    "storageSize" : 315006976,
    "numExtents" : 15,
    "nindexes" : 3,
    "lastExtentSize" : 84426752,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 120416128,
    "indexSizes" : {
        "_id_" : 32458720,
        "position_2dsphere_timestamp_-1" : 55629504,
        "timestamp_-1" : 32327904
    },
    "ok" : 1
}

2. インデックス:

> db.randomcoordinates.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.randomcoordinates",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "position" : "2dsphere",
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "position_2dsphere_timestamp_-1"
    },
    {
        "v" : 1,
        "key" : {
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "timestamp_-1"
    }
]

3. 2dsphere 複合インデックスを使用して検索します。

> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116775,
    "nscanned" : 283424,
    "nscannedObjectsAllPlans" : 116775,
    "nscannedAllPlans" : 283424,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 3876,
    "indexBounds" : {

    },
    "nscanned" : 283424,
    "matchTested" : NumberLong(166649),
    "geoTested" : NumberLong(166649),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}

4. タイムスタンプインデックスを使用して検索します。

> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
    "cursor" : "BtreeCursor timestamp_-1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 63,
    "nscanned" : 63,
    "nscannedObjectsAllPlans" : 63,
    "nscannedAllPlans" : 63,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "timestamp" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "chan:27017"
}

インデックスを使用することを提案している人もいる{timestamp: -1, position: "2dsphere"}ので、それも試してみましたが、十分に機能していないようです。

5. Timestamp + 2dsphere 複合インデックスを使用して検索する

> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1_position_2dsphere").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116953,
    "nscanned" : 286513,
    "nscannedObjectsAllPlans" : 116953,
    "nscannedAllPlans" : 286513,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 4597,
    "indexBounds" : {

    },
    "nscanned" : 286513,
    "matchTested" : NumberLong(169560),
    "geoTested" : NumberLong(169560),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}

score 1 · Accepted Answer

データセットで集計フレームワークを使用してみましたか?

必要なクエリは次のようになります。

db.randomcoordinates.aggregate(
    { $match: {position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}},
    { $sort: { timestamp: -1 } },
    { $limit: 10 }
);

残念ながら、集約フレームワークはexplainまだ本番ビルドに含まれていないため、時間に大きな違いが生じるかどうかだけがわかります。ソースからのビルドに問題がなければ、先月末の時点でそこにあるようです: https://jira.mongodb.org/browse/SERVER-4504。また、来週の火曜日 (2013 年 10 月 15 日) にリリース予定の Dev ビルド 2.5.3 にも含まれるようです。

score 1 · Accepted Answer

地域に関係なく、予測どおりに実行するにはどうすればよいですか?

$geoWithin単に Θ(1) 効率で動作しません。私が理解しているように、それは Θ(n) 効率平均ケースで動作します (alg を考慮すると、最大で n ポイント、最小で 10 ポイントをチェックする必要があります)。

ただし、最も最近追加された座標が最初に処理されて、 Θ(10) 効率を得る可能性が高くなるように、座標コレクションに対して何らかの前処理を行うことは間違いありませんposition_2dsphere_timestamp_-1。行く方法）！

{timestamp: -1, position: "2dsphere"} インデックスを使用することを提案している人もいるので、それも試してみましたが、十分に機能していないようです。

(最初の質問への回答を参照してください。)

さらに、次のことが役立つ場合があります。

MongoDB の最適化戦略

お役に立てれば！

TL;DR、必要なだけインデックスをだますことができますが、$geoWithin書き直さない限り、効率が向上することはありません。

そうは言っても、いつでもインデックスのパフォーマンスを最適化することに集中して、必要に応じて関数を書き直すことができます!

mongodb - MongoDB 2dsphere インデックス $geoWithin パフォーマンス

3 に答える 3

Related

Reference