mysql - MYSQL での自己結合の高速化

Question

さまざまなオブジェクト間の接続のテーブルがあり、基本的に自己結合を使用してグラフトラバーサルを試みています。私のテーブルは次のように定義されています：

CREATE TABLE `connections` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `position` int(11) NOT NULL,
  `dId` bigint(20) NOT NULL,
  `sourceId` bigint(20) NOT NULL,
  `targetId` bigint(20) NOT NULL,
  `type` bigint(20) NOT NULL,
  `weight` float NOT NULL DEFAULT '1',
  `refId` bigint(20) NOT NULL,
  `ts` bigint(20) NOT NULL,
  PRIMARY KEY (`id`),
  KEY `sourcetype` (`type`,`sourceId`,`targetId`),
  KEY `targettype` (`type`,`targetId`,`sourceId`),
  KEY `complete` (`dId`,`sourceId`,`targetId`,`type`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

テーブルには約 3M のエントリが含まれています (タイプ 1 の約 1K、タイプ 2 の 1M、およびタイプ 3 の 2M)。

2 ホップまたは 3 ホップを超えるクエリは、実際には非常に高速ですが (もちろん、すべての結果を受け取るには時間がかかります)、3 ホップのクエリのカウントを取得するのは非常に低速です (> 30 秒)。

クエリは次のとおりです (2M を返します)。

SELECT
  count(*)
FROM
  `connections` AS `t0`
JOIN
  `connections` AS `t1` ON `t1`.`targetid`=`t0`.`sourceid`
JOIN
  `connections` AS `t2` ON `t2`.`targetid`=`t1`.`sourceid`
WHERE
  `t2`.dId = 1
  AND
  `t2`.`sourceid` = 1
  AND
  `t2`.`type` = 1
  AND
  `t1`.`type` = 2
  AND
  `t0`.`type` = 3;

対応する EXPLAIN は次のとおりです。

id  select_type  table  type  possible_keys                   key         key_len  ref                         rows  Extra  
1   SIMPLE       t2     ref   targettype,complete,sourcetype  complete    16       const,const                  100  Using where; Using index
1   SIMPLE       t1     ref   targettype,sourcetype           targettype   8       const                       2964  Using where; Using index
1   SIMPLE       t0     ref   targettype,sourcetype           sourcetype  16       const,travtest.t1.targetId  2964  Using index

編集：これは、追加してインデックスを付けた後の EXPLAINtypeです：

id  select_type  table  type  possible_keys                        key         key_len  ref                         rows  Extra     
1   SIMPLE       t2     ref   type,complete,sourcetype,targettype  complete    16       const,const                 100   Using where; Using index
1   SIMPLE       t1     ref   type,sourcetype,targettype           sourcetype  16       const,travtest.t2.targetId    2   Using index
1   SIMPLE       t0     ref   type,sourcetype,targettype           sourcetype  16       const,travtest.t1.targetId    2   Using index

これを改善する方法はありますか？

2回目の編集:

EXPLAN EXTENDED:
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+
| id | select_type | table | type | possible_keys                       | key        | key_len | ref                        | rows | filtered | Extra                    |
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+
|  1 | SIMPLE      | t2    | ref  | type,complete,sourcetype,targettype | complete   | 16      | const,const                |  100 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | t1    | ref  | type,sourcetype,targettype          | sourcetype | 16      | const,travtest.t2.targetId |    1 |   100.00 | Using index              |
|  1 | SIMPLE      | t0    | ref  | type,sourcetype,targettype          | sourcetype | 16      | const,travtest.t1.targetId |    1 |   100.00 | Using index              |
+----+-------------+-------+------+-------------------------------------+------------+---------+----------------------------+------+----------+--------------------------+

SHOW WARNINGS;
+-------+------+--------------------------------------------------------------------------------------------+
| Level | Code | Message                                                                                    |
+-------+------+--------------------------------------------------------------------------------------------+
| Note  | 1003 | /* select#1 */ select count(0) AS `count(*)` from `travtest`.`connections` `t0`            |
|       |      | join `travtest`.`connections` `t1` join `travtest`.`connections` `t2`                      |
|       |      | where ((`travtest`.`t0`.`sourceId` = `travtest`.`t1`.`targetId`) and                       |
|       |      | (`travtest`.`t1`.`sourceId` = `travtest`.`t2`.`targetId`) and (`travtest`.`t0`.`type` = 3) |
|       |      | and (`travtest`.`t1`.`type` = 2) and (`travtest`.`t2`.`type` = 1) and                      |
|       |      | (`travtest`.`t2`.`sourceId` = 1) and (`travtest`.`t2`.`dId` = 1))                          |
+-------+------+--------------------------------------------------------------------------------------------+

score 0 · Accepted Answer

クエリが 2M (200 万を意味すると仮定) を返すことは、最終カウント、または単に 200 万を通過することであるとコメントします。クエリは、他のテーブルに結合された単一の T2.ID、ソースおよびタイプを具体的に探しているように見えますが、接続 0 から始まります。

既存のインデックスを削除して、エンジンが他のインデックスを使用しようとせず、結合方法に混乱が生じないように、次のようにします。また、ターゲット ID を両方に持つことで (すでに持っていたように)、これらがインデックスをカバーすることになり、エンジンはページ上の実際の生データに移動して他の条件を確認したり、値を取得したりする必要がなくなります。

唯一のインデックスは、ソース、タイプ、および ID による T2 のような究極の基準に基づいています。targetID (インデックスの一部) はデイジーチェーンの次のソースであるため、チェーンを上るソースとタイプに同じインデックスを使用しています。インデックスの混乱なし

INDEX ON (sourceId、typedId、targetid)

私は、できれば最小のセットになるように逆にして試してみます...

SELECT
      COUNT(*)
   FROM
      `connections` t2
         JOIN `connections` t1
            ON t2.targetID = t1.sourceid
            AND t1.`type` = 2
            JOIN `connections` t0
               ON t1.targetid = t0.sourceid
               AND t0.`type` = 3
   where
          t2.sourceid = 1
      AND t2.type = 1
      AND t2.dID = 1

mysql - MYSQL での自己結合の高速化

4 に答える 4

Related

Reference