mysql - キーが利用可能な場合のMySQLクエリの全テーブルスキャン

Question

いくつかの結合されたテーブルから多数の一連の列 (〜 15-20) を取得しようとして、必要な情報を取得する 2 つのビューをまとめました。ただし、私のローカル DB (〜 1kposts行のみ) では、これらのビューの結合は正常に機能しました。実稼働 DB (~30kposts行) で同じビューを作成し、ビューに参加しようとしたとき、そのソリューションはテストデータセットを超えて拡張できないことに気付きました。

これらの 2 つのビュー (カテゴリデータ —categories.titleなどusers.display_name) を CTEに移行しようとしましたpost_data。これは、理論的には、これらのビューのキー付きバージョンとして機能し、適格な投稿のすべての投稿データを取得できるようにします。 .

テーブル構造を説明するために、サンプルDBFiddleといくつかのテストデータをまとめました。実際のデータにはさらに多くの列がありますが、これはクエリを作成するために必要な結合を表しています。

table : posts
+-----+-----------+------------+------------------------------------------+----------------------------------------+
| id  | parent_id | created_by |                 message                  |              attachments               |
+-----+-----------+------------+------------------------------------------+----------------------------------------+
|  8  | NULL      |          8 | laptop for sale                          | [{"media_id": 1380}]                   |
|  9  | NULL      |          4 | NEW lamp shade up for grabs              | [{"media_id": 1442}, {"link_id": 103}] |
|  10 | 1         |          7 | Oooh I could be interested               |                                        |
|  11 | 1         |          7 | DMing you now! I've been looking for one |                                        |
+-----+-----------+------------+------------------------------------------+----------------------------------------+

table : users
+----+------------------+---------------------------+
| id |   display_name   |        created_at         |
+----+------------------+---------------------------+
|  1 | John Appleseed   | 2018-02-20T00:00:00+00:00 |
|  2 | Massimo Jenkins  | 2018-05-14T00:00:00+00:00 |
|  3 | Johanna Marionna | 2018-06-05T00:00:00+00:00 |
|  4 | Jackson Creek    | 2018-11-15T00:00:00+00:00 |
|  5 | Joe Schmoe       | 2019-01-09T00:00:00+00:00 |
|  6 | John Johnson     | 2019-02-14T00:00:00+00:00 |
|  7 | Donna Madison    | 2019-05-14T00:00:00+00:00 |
|  8 | Jenna Kaplan     | 2019-06-23T00:00:00+00:00 |
+----+------------------+---------------------------+

table : categories
+----+------------+------------+-------------------------------------------------------+
| id | created_by |   title    |                      description                      |
+----+------------+------------+-------------------------------------------------------+
|  1 |          2 | Technology | Anything tech; Consumer, business or education tools! |
|  2 |          2 | Home Goods | Anything for the home                                 |
+----+------------+------------+-------------------------------------------------------+

table : categories_posts
+---------+-------------+
| post_id | category_id |
+---------+-------------+
|       8 |           1 |
|       9 |           1 |
|      10 |           1 |
|      11 |           1 |
+---------+-------------+

table : users_categories
+---------+-------------+
| user_id | category_id |
+---------+-------------+
|       1 |           1 |
|       2 |           1 |
|       3 |           1 |
|       4 |           1 |
+---------+-------------+

table : posts_removed
+---------+----------------------+------------+
| post_id |      removed_at      | removed_by |
+---------+----------------------+------------+
|      10 |  2019-01-22 09:08:14 |          7 |
+---------+----------------------+------------+

以下のクエリでは、適格な投稿は base で決定されSELECTます。次に、post_data CTE が結果セット (25 行に制限) に結合され、CTE のすべての列が返されます。

WITH post_data AS (
    SELECT posts.id,
           posts.parent_id,
           posts.created_by,
           posts.attachments,
           categories_posts.category_id,
           categories.title,
           categories.created_by AS category_created_by,
           creator.display_name AS creator_display_name,
           creator.created_at AS creator_created_at
           /* ... And a whole bunch of other fields from posts, categories_posts, users */
    FROM posts
    LEFT OUTER JOIN categories_posts
        ON categories_posts.post_id = posts.id
    LEFT OUTER JOIN categories
        ON categories.id = categories_posts.category_id
    LEFT OUTER JOIN users creator
        ON creator.id = posts.created_by
    /* ... And a whole bunch of other joins to facilitate the selected fields */
)
SELECT post_data.*
FROM posts
        /* Set up the criteria for the posts selected before getting their data from the CTE */
    LEFT OUTER JOIN posts_removed removed ON removed.post_id = posts.id
    LEFT OUTER JOIN users user_me ON user_me.id = "1"
    LEFT OUTER JOIN users_followed ON users_followed.user_id = posts.created_by
        AND users_followed.followed_by = user_me.id
    LEFT OUTER JOIN categories_posts ON categories_posts.post_id = posts.id
    LEFT OUTER JOIN users_categories ON users_categories.category_id = categories_posts.category_id
    LEFT OUTER JOIN posts_removed pp_removed ON pp_removed.post_id = posts.parent_id
    /* Join our post_data on the post's ID */
    JOIN post_data ON post_data.id = posts.id
WHERE
(
    (
        users_categories.user_id = user_me.id AND users_categories.left_at IS NULL
    ) OR categories_posts.category_id IS NULL
) AND (
    posts.created_by = user_me.id
    OR users_followed.followed_by = user_me.id
    OR categories_posts.category_id IS NOT NULL
) AND removed.removed_at IS NULL
    AND pp_removed.removed_at IS NULL
    AND (post_data.id = posts.id OR post_data.id = posts.parent_id)
ORDER BY posts.id DESC
LIMIT 25

理論的には、基本選択基準に基づいて行を選択し、投稿 ID に基づいて CTE のインデックススキャンを実行することで、これが機能すると考えました。ただし、クエリオプティマイザーは代わりに、テーブルのフルテーブルスキャンを実行することを選択しているようですposts。

から次のEXPLAIN SELECT情報が得られました。

+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
| id | select_type |         table          |  type  |         possible_keys         |     key     | key_len |                     ref                     |  rows  | filtered |                       extra                        |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+
|  1 | PRIMARY     | posts                  | ALL    | PRIMARY,parent_id,created_by  |             |         |                                             |  33870 |      100 | Using temporary; Using filesort                    |
|  1 | PRIMARY     | removed                | eq_ref | PRIMARY                       | PRIMARY     |       8 | posts.id                                    |      1 |       19 | Using where                                        |
|  1 | PRIMARY     | user_me                | const  | PRIMARY                       | PRIMARY     |       8 | const                                       |      1 |      100 | Using where; Using index                           |
|  1 | PRIMARY     | categories_posts       | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 |                                                    |
|  1 | PRIMARY     | categories             | eq_ref | PRIMARY                       | PRIMARY     |       8 | categories_posts.category_id                |      1 |      100 | Using index                                        |
|  1 | PRIMARY     | users_categories       | eq_ref | user_id_2,user_id,category_id | user_id_2   |      16 | user_me.id,api.categories_posts.category_id |      1 |      100 | Using where                                        |
|  1 | PRIMARY     | users_followed         | eq_ref | user_id,followed_by           | user_id     |      16 | posts.created_by,api.user_me.id             |      1 |      100 | Using where; Using index                           |
|  1 | PRIMARY     | pp_removed             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.parent_id                         |      1 |       19 | Using where                                        |
|  1 | PRIMARY     | <derived2>             | ALL    |                               |             |         |                                             | 493911 |       19 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | posts                  | ALL    |                               |             |         |                                             |  33870 |      100 | Using temporary                                    |
|  2 | DERIVED     | categories_posts       | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 |                                                    |
|  2 | DERIVED     | categories             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.categories_posts.category_id            |      1 |      100 |                                                    |
|  2 | DERIVED     | posts_votes            | ref    | post_id                       | post_id     |       8 | api.posts.id                                |      1 |      100 | Using index                                        |
|  2 | DERIVED     | pp                     | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.parent_id                         |      1 |      100 |                                                    |
|  2 | DERIVED     | pp_removed             | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.pp.id                                   |      1 |      100 | Using index                                        |
|  2 | DERIVED     | removed                | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.id                                |      1 |      100 | Using index                                        |
|  2 | DERIVED     | creator                | eq_ref | PRIMARY                       | PRIMARY     |       8 | api.posts.created_by                        |      1 |      100 |                                                    |
|  2 | DERIVED     | usernames              | ref    | user_id                       | user_id     |       8 | api.creator.id                              |      1 |      100 |                                                    |
|  2 | DERIVED     | verifications          | ALL    |                               |             |         |                                             |      4 |      100 | Using where; Using join buffer (Block Nested Loop) |
|  2 | DERIVED     | categories_identifiers | ref    | category_id                   | category_id |       8 | api.categories.id                           |      1 |      100 |                                                    |
+----+-------------+------------------------+--------+-------------------------------+-------------+---------+---------------------------------------------+--------+----------+----------------------------------------------------+

postsこれを超えて、クエリをリファクタリングして、選択で使用するなど、テーブルでのキーの使用を強制しようとしFORCE INDEX(PRIMARY)ましたが、CTE をベースクエリに移動してフィルターを追加しましたWHERE id IN ({the original base query})が、オプティマイザーはまだ完全なテーブルスキャンを行っているようです。

クエリプランで何が起こっているかを解読すると役立つ場合:

執筆時点では33,387 posts行ありますが、クエリプランでは
クエリプランは、 33,870行を返すフルテーブルスキャンを示しています。
クエリプランでは、派生テーブル ( <derived2>) が493,911行あることも示されています。

私の主な質問は次のとおりです。

ベース選択クエリからの結果行ごとにサブクエリを 1 回だけ実行する必要があると言うのは正しいですか? もしそうなら、CTEはJOINも使用しposts.id、おそらくテーブルインデックスを使用する必要がありますか?
33,387行しかないのに、クエリプランで33,870行が選択されるのはなぜですか? そして、493,911 行はどこから来るのでしょうか?
この場合、全表スキャンをどのように防止しますか?

mysql - キーが利用可能な場合のMySQLクエリの全テーブルスキャン

1 に答える 1

Related

Reference