I'm current writing an application that has to execute the same query many times. The query has a (potentially large) array as parameter, and looks like:
SELECT
m.a, SUM(m.b) as b, SUM(m.c) as c, SUM(m.d) as d
FROM table_m m JOIN table_k k ON (k.x IN %s AND k.id = m.y)
WHERE m.b > 0
GROUP BY m.a
I'm using Psycopg2 on Postgresql 9.1. For each query I create a new cursor and execute() the query with a list of numbers as parameter (the query is execute around 5000 times in my test cast). The length of the input list varies from anywhere between 1 and 5000 items.
On average the query takes slightly under 50ms to run, with the slowest execution taking around 500ms.
I have two questions about this:
- Is there anything I can do to optimize this query?
- Is there any way to prepare the query once, and execute it many times (or is Psycopg2 doing this internally)?
Schema for table_k
Column | Type | Modifiers
---------------+--------+-----------
id | bigint | not null
x | bigint |
Indexes:
"table_k_pkey" PRIMARY KEY, btree (id)
"table_k_id_x_idx" btree (id, x)
"table_k_x_idx" btree (x)
Schema for table_m
Column | Type | Modifiers
---------------------+-----------------------------+-----------
id | bigint | not null
y | bigint |
a | bigint |
b | integer |
c | integer |
d | double precision |
Indexes:
"table_m_pkey" PRIMARY KEY, btree (id)
"table_m_y_idx" hash (y)
"table_m_a_idx" btree (a)
"table_m_b_idx" btree (b)
Hope this is enough information.