I have a query involving two tables: table A
has lots of rows, and contains a field called b_id
, which references a record from table B
, which has about 30 different rows. Table A
has an index on b_id
, and table B
has an index on the column name
.
My query looks something like this:
SELECT COUNT(A.id) FROM A INNER JOIN B ON B.id = A.b_id WHERE (B.name != 'dummy') AND <condition>;
With condition
being some random condition on table A
(I have lots of those, all exhibiting the same behavior).
This query is extremely slow (taking north of 2 seconds), and using explain, shows that query optimizer starts with table B
, coming up with about 29 rows, and then scans table A
. Doing a STRAIGHT_JOIN
, turned the order around and the query ran instantaneously.
I'm not a fan of black magic, so I decided to try something else: come up with the id for the record in B
that has the name dummy
, let's say 23, and then simplify the query to:
SELECT COUNT(A.id) FROM A WHERE (b_id != 23) AND <condition>;
To my surprise, this query was actually slower than the straight join, taking north of a second.
Any ideas on why the join would be faster than the simplified query?
UPDATE: following a request in the comments, the outputs from explain:
Straight join:
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
| 1 | SIMPLE | A | ALL | b_id | NULL | NULL | NULL | 200707 | Using where |
| 1 | SIMPLE | B | eq_ref | PRIMARY,id_name | PRIMARY | 4 | schema.A.b_id | 1 | Using where |
+----+-------------+-------+--------+-----------------+---------+---------+---------------+--------+-------------+
No join:
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
| 1 | SIMPLE | A | ALL | b_id | NULL | NULL | NULL | 200707 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+--------+-------------+
UPDATE 2: Tried another variant:
SELECT COUNT(A.id) FROM A WHERE b_id IN (<all the ids except for 23>) AND <condition>;
This runs faster than the no join, but still slower than the join, so it seems that the inequality operation is responsible for part of the performance hit, but not all.