10 things in MySQL (that won’t work as expected)
with 44 comments
(I just discovered cracked.com)
#10. Searching for a NULL
1
2
3
| SELECT * FROM a WHERE a. column = NULL |
In SQL, a
NULL
is never equal to anything, even another NULL
. This query won't return anything and in fact will be thrown out by the optimizer when building the plan.
When searching for
NULL
values, use this instead:
1
2
3
| SELECT * FROM a WHERE a. column IS NULL |
#9. LEFT JOIN with additional conditions
1
2
3
4
5
6
| SELECT * FROM a LEFT JOIN b ON b.a = a.id WHERE b. column = 'something' |
A
LEFT JOIN
is like INNER JOIN
except that it will return each record from a
at least once, substituting missing fields from b
with NULL
values, if there are no actual matching records.
The
WHERE
condition, however, is evaluated after the LEFT JOIN
so the query above checks column
after it had been joined. And as we learned earlier, no NULL
value can satisfy an equality condition, so the records from a
without corresponding record from b
will unavoidably be filtered out.
Essentially, this query is an
INNER JOIN
, only less efficient.
To match only the records with
b.column = 'something'
(while still returning all records from a
), this condition should be moved into ON
clause:
1
2
3
4
5
6
| SELECT * FROM a LEFT JOIN b ON b.a = a.id AND b. column = 'something' |
#8. Less than a value but not a NULL
Quite often I see the queries like this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
| SELECT * FROM b WHERE b. column < 'something' AND b. column IS NOT NULL [/sourcecode] This is actually not an error: this query is valid and will do what 's intended. However, <code>IS NOT NULL</code> here is redundant. If <code>b.column</code> is a <code>NULL</code>, then <code>b.column < ' something '</code> will never be satisfied, since any comparison to <code>NULL</code> evaluates to a boolean <code>NULL</code> and does not pass the filter. It is interesting that this additional <code>NULL</code> check is never used for <q>greater than</q> queries (like in <code>b.column > ' something '</code>). This is because <code>NULL</code> go first in <code>ORDER BY</code> in <strong>MySQL</strong> and hence are incorrectly considered <q>less</q> than any other value by some people. This query can be simplified: SELECT * FROM b WHERE b.column < ' something' [/sourcecode] and will still never return a <code> NULL </code> in <code>b. column </code>. <h3 class= "cracked" >#7. Joining on NULL </h3> <img src= "https://explainextended.com/wp-content/uploads/2010/11/MG_3163-e1288839302867.jpg" alt= "" title= "Helicopter" width= "700" height= "467" class= "aligncenter size-full wp-image-5105 noborder" /> SELECT * FROM a JOIN b ON a. column = b. column |
When
column
is nullable in both tables, this query won't return a match of two NULL
s for the reasons described above: no NULL
s are equal.
Here's a query to do that:
1
2
3
4
5
| SELECT * FROM a JOIN b ON a. column = b. column OR (a. column IS NULL AND b. column IS NULL ) |
MySQL's optimizer treats this as an equijoin and provides a special join condition,
ref_or_null
.#6. NOT IN with NULL values
1
2
3
4
5
6
7
| SELECT a.* FROM a WHERE a. column NOT IN ( SELECT column FROM b ) |
This query will never return anything if there is but a single
NULL
in b.column
. As with other predicates, both IN
and NOT IN
against NULL
evaluate to NULL
.
This should be rewritten using a
NOT EXISTS
:
1
2
3
4
5
6
7
8
| SELECT a.* FROM a WHERE NOT EXISTS ( SELECT NULL FROM b WHERE b. column = a. column ) |
Unlike
IN
, EXISTS
always evaluates to either true
or false
.#5. Ordering random samples
1
2
3
4
5
| SELECT * FROM a ORDER BY RAND(), column LIMIT 10 |
This query attempts to select 10 random records ordered by
column
.ORDER BY
orders the output lexicographically: that is, the records are only ordered on the second expression when the values of the first expression are equal.
However, the results of
RAND()
are, well, random. It's infeasible that the values of RAND()
will match, so ordering on column
after RAND()
is quite useless.
To order the randomly sampled records, use this query:
1
2
3
4
5
6
7
8
9
10
| SELECT * FROM ( SELECT * FROM mytable ORDER BY RAND() LIMIT 10 ) q ORDER BY column |
#4. Sampling arbitrary record from a group
This query intends to select one
column
from each group (defined by grouper
)
1
2
| SELECT DISTINCT (grouper), a.* FROM a |
DISTINCT
is not a function, it's a part of SELECT
clause. It applies to all columns in the SELECT
list, and the parentheses here may just be omitted. This query may and will select the duplicates on grouper
(if the values in at least one of the other columns differ).
Sometimes, it's worked around using this query (which relies on MySQL's extensions to
GROUP BY
):
1
2
3
4
| SELECT a.* FROM a GROUP BY grouper |
Unaggregated columns returned within each group are arbitrarily taken.
At first, this appears to be a nice solution, but it has quite a serious drawback. It relies on the assumption that all values returned, though taken arbitrarily from the group, will still belong to one record.
Though with current implementation is seems to be so, it's not documented and can be changed in any moment (especially if MySQL will ever learn to apply
index_union
after GROUP BY
). So it's not safe to rely on this behavior.
This query would be easy to rewrite in a cleaner way if MySQL supported analytic functions. However, it's still possible to make do without them, if the table has a
PRIMARY KEY
defined:
1
2
3
4
5
6
7
8
9
10
11
12
13
| SELECT a.* FROM ( SELECT DISTINCT grouper FROM a ) ao JOIN a ON a.id = ( SELECT id FROM a ai WHERE ai.grouper = ao.grouper LIMIT 1 ) |
#3. Sampling first record from a group
This is a variation of the previous query:
1
2
3
4
5
6
| SELECT a.* FROM a GROUP BY grouper ORDER BY MIN (id) DESC |
Unlike the previous query, this one attempts to select the record holding the minimal
id
.
Again: it is not guaranteed that the unaggregated values returned by
a.*
will belong to a record holding MIN(id)
(or even to a single record at all).
Here's how to do it in a clean way:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| SELECT a.* FROM ( SELECT DISTINCT grouper FROM a ) ao JOIN a ON a.id = ( SELECT id FROM a ai WHERE ai.grouper = ao.grouper ORDER BY ai.grouper, ai.id LIMIT 1 ) |
This query is just like the previous one but with
ORDER BY
added to ensure that the first record in id
order will be returned.#2. IN and comma-separated list of values
This query attempts to match the value of
column
against any of those provided in a comma-separated string:
1
2
3
| SELECT * FROM a WHERE column IN ( '1, 2, 3' ) |
This does not work because the string is not expanded in the
IN
list.
Instead, if column
column
is a VARCHAR
, it is compared (as a string) to the whole list (also as a string), and of course will never match. If column
is of a numeric type, the list is cast into the numeric type as well (and only the first item will match, at best).
The correct way to deal with this query would be rewriting it as a proper
IN
list
1
2
3
| SELECT * FROM a WHERE column IN (1, 2, 3) |
, or as an inline view:
1
2
3
4
5
6
7
8
9
10
| SELECT * FROM ( SELECT 1 AS id UNION ALL SELECT 2 AS id UNION ALL SELECT 3 AS id ) q JOIN a ON a. column = q.id |
, but this is not always possible.
To work around this without changing the query parameters, one can use
FIND_IN_SET
:
1
2
3
| SELECT * FROM a WHERE FIND_IN_SET( column , '1,2,3' ) |
This function, however, is not sargable and a full table scan will be performed on
a
.#1. LEFT JOIN with COUNT(*)
1
2
3
4
5
6
7
| SELECT a.id, COUNT (*) FROM a LEFT JOIN b ON b.a = a.id GROUP BY a.id |
This query intends to count number of matches in
b
for each record in a
.
The problem is that
COUNT(*)
will never return a 0 in such a query. If there is no match for a certain record in a
, the record will be still returned and counted.COUNT
should be made to count only the actual records in b
. Since COUNT(*)
, when called with an argument, ignores NULL
s, we can pass b.a
to it. As a join key, it can never be a null in an actual match, but will be if there were no match:
1
2
3
4
5
6
7
| SELECT a.id, COUNT (b.a) FROM a LEFT JOIN b ON b.a = a.id GROUP BY a.id |
P.S. In case you were wondering: no, the pictures don't have any special meaning. I just liked them.