It's mostly believed that
INNER JOINs over
WHERE IN subqueries, i.e.:
select count(*) from t1 where col in (select col from t2)
But rather this:
select count(*) from t1 inner join t2 on t1.col = t2.col
Often the results from a correlated sub-query can be replicated using an
INNER JOIN. Depending on what your requirements are, using an
INNER JOINmay be more efficient because it only makes one pass through the data whereas the correlated sub-query must execute for each row in the outer query. – DataCamp
Historically, explicit joins usually win, hence the established wisdom that joins are better, but optimisers are getting better all the time, and so I prefer to write queries first in a logically coherent way, and then restructure if performance constraints warrant this. – Marcel Cantos, Stack Overflow
LEFT JOINcan be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone. – MySQL docs
Generally speaking, using an
INNER JOINis often more efficient than using a subquery with an
INNER JOINallows the database engine to optimize the query execution by using indexes and other optimization techniques, whereas the subquery with IN might be executed as a separate step, leading to less efficient processing. – ChatGPT
But be aware that it really depends. Like, for Athena and big tables, you might easily run into pushing its resources a bit too much and then optimising against
JOINs, i.e. cartesian products:
- Try to reduce the resource required by intermediate results in the plan:
a. Reduce the number of columns projected.
b. Try to split the query into 2 or more queries and materialize the any the earlier parts in a permanent table.
c. Look hard to see if plan stalling operation like sorts on subqueries can be eliminated.
- Split the query into smaller data increments.
- Try different join orders. – AWS Athena, Query exhausted resources at scale factor