query-optimization - 遅いクエリの最適化に関する質問 (SQL を含む)

Question

SELECT DISTINCT "myapp_profile"."user_id", "myapp_profile"."name", 
  "myapp_profile"."age", "auth_user"."id", "auth_user"."username", 
  "auth_user"."first_name", "auth_user"."last_name", "auth_user"."email", 
  "auth_user"."password", "auth_user"."is_staff", "auth_user"."is_active", 
  "auth_user"."is_superuser", "auth_user"."last_login", "auth_user"."date_joined" 
FROM "myapp_profile" 
INNER JOIN "auth_user" ON ("myapp_profile"."user_id" = "auth_user"."id") 
LEFT OUTER JOIN "myapp_siterel" ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
LEFT OUTER JOIN "django_site" ON ("myapp_siterel"."site_id" = "django_site"."id") 
WHERE ("auth_user"."is_superuser" = false 
AND "auth_user"."is_staff" = false 
AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))) 
ORDER BY "myapp_profile"."user_id" 
DESC LIMIT 100

上記のクエリは、200 万のユーザー/プロファイルで実行するのに約 100 秒かかります。私は DBA ではなく、私たちの DBA は何ができるかを確認するために状況を調べていますが、どのような変更が行われるかを確認することはおそらくないため (DB レベルで発生すると仮定して)、どのように最適化できるか興味があります。このクエリ。明らかに、5 秒以下のオーダーのように、実際よりもはるかに速く発生する必要があります。SQL を最適化する方法がない場合、クエリを高速化するために追加/変更できるインデックスはありますか、それとも他に見落としているものはありますか?

Postgres 9 が DB であり、Django の ORM がこのクエリの元です。

クエリプラン

Limit (cost=1374.35..1383.10 rows=100 width=106)
-> Unique (cost=1374.35..1391.24 rows=193 width=106)
-> Sort (cost=1374.35..1374.83 rows=193 width=106)
Sort Key: myapp_profile.user_id, myapp_profile.name, myapp_profile.age, auth_user.username, auth_user.first_name, auth_user.last_name, auth_user.email, auth_user.password, auth_user.is_staff, auth_user.is_active, auth_user.is_superuser, auth_user.last_login, auth_user.date_joined
-> Nested Loop (cost=453.99..1367.02 rows=193 width=106)
-> Hash Left Join (cost=453.99..1302.53 rows=193 width=49)
Hash Cond: (myapp_siterel.site_id = django_site.id)
Filter: ((django_site.id IS NULL) OR (django_site.id = ANY ('{10080,10053}'::integer[])))
-> Hash Left Join (cost=448.50..1053.27 rows=15001 width=53)
Hash Cond: (myapp_profile.user_id = myapp_siterel.profile_id)
-> Seq Scan on myapp_profile (cost=0.00..286.01 rows=15001 width=49)
-> Hash (cost=261.00..261.00 rows=15000 width=8)
-> Seq Scan on myapp_siterel (cost=0.00..261.00 rows=15000 width=8)
-> Hash (cost=3.55..3.55 rows=155 width=4)
-> Seq Scan on django_site (cost=0.00..3.55 rows=155 width=4)
-> Index Scan using auth_user_pkey on auth_user (cost=0.00..0.32 rows=1 width=57)
Index Cond: (auth_user.id = myapp_profile.user_id)
Filter: ((NOT auth_user.is_superuser) AND (NOT auth_user.is_staff))

ありがとう

score 2 · Accepted Answer

I'm not so familiar with postgres, so I'm not sure how good it's query optimiser is, but it looks like everything you have in the where clause could instead be join conditions, although I'd hope postgres is clever enough to work that out for itself, however if it's not then it's going to fetch all your 2 million users with related records in the other 3 tables and then filter that using your where.

The indexes already mentioned should also work for you if they don't already exist. Again i'm more an MSSQL person but does postgres not have any statistics profile or query plan you can see?

Something along these lines

SELECT DISTINCT
    "myapp_profile"."user_id",
    "myapp_profile"."name", 
    "myapp_profile"."age",
    "auth_user"."id",
    "auth_user"."username", 
    "auth_user"."first_name",
    "auth_user"."last_name",
    "auth_user"."email", 
    "auth_user"."password",
    "auth_user"."is_staff",
    "auth_user"."is_active", 
    "auth_user"."is_superuser",
    "auth_user"."last_login",
    "auth_user"."date_joined" 
FROM "myapp_profile" 
    INNER JOIN "auth_user"
        ON ("myapp_profile"."user_id" = "auth_user"."id") 
        AND "auth_user"."is_superuser" = false
        AND "auth_user"."is_staff" = false 
    LEFT OUTER JOIN "myapp_siterel"
        ON ("myapp_profile"."user_id" = "myapp_siterel"."profile_id") 
    LEFT OUTER JOIN "django_site"
        ON ("myapp_siterel"."site_id" = "django_site"."id") 
        AND ("django_site"."id" IS NULL OR "django_site"."id" IN (15, 16))
ORDER BY "myapp_profile"."user_id" DESC
LIMIT 100

Also, do you need the distinct? That'll also slow it down somewhat.

score 1 · Accepted Answer

基本の場合：

すべてのユーザーIDフィールドにインデックスが付けられていることを確認してください。

また、is_supervisorとis_staffのインデックスでうまくいくように見えます

score 1 · Accepted Answer

クエリ最適化のための単純な銀の弾丸の解決策は決してありませんが、明らかな手順は、検索している列にインデックスを付けることです。あなたの場合、それは次のとおりです。

"auth_user"."is_superuser"
"auth_user"."is_staff"
"django_site"."id"
"myapp_profile"."user_id"

query-optimization - 遅いクエリの最適化に関する質問 (SQL を含む)

クエリ プラン

3 に答える 3

Related

Reference

クエリプラン