Use INNER JOIN over subqueries mostly

  • data-coding
  • idea
  • sql
  • spring

It's mostly believed that INNER JOINs over WHERE IN subqueries, i.e.:

Not this:

select count(*)
from t1
where col in (select col from t2)

But rather this:

select count(*)
from t1
inner join t2
    on t1.col = t2.col

Often the results from a correlated sub-query can be replicated using an INNER JOIN. Depending on what your requirements are, using an INNER JOIN may be more efficient because it only makes one pass through the data whereas the correlated sub-query must execute for each row in the outer query. – DataCamp

Historically, explicit joins usually win, hence the established wisdom that joins are better, but optimisers are getting better all the time, and so I prefer to write queries first in a logically coherent way, and then restructure if performance constraints warrant this. – Marcel Cantos, Stack Overflow

A LEFT JOIN can be faster than an equivalent subquery because the server might be able to optimize it better—a fact that is not specific to MySQL Server alone. – MySQL docs

Generally speaking, using an INNER JOIN is often more efficient than using a subquery with an IN clause. The INNER JOIN allows the database engine to optimize the query execution by using indexes and other optimization techniques, whereas the subquery with IN might be executed as a separate step, leading to less efficient processing. – ChatGPT

But be aware that it really depends. Like, for Athena and big tables, you might easily run into pushing its resources a bit too much and then optimising against JOINs, i.e. cartesian products:

  1. Try to reduce the resource required by intermediate results in the plan:
    a. Reduce the number of columns projected.
    b. Try to split the query into 2 or more queries and materialize the any the earlier parts in a permanent table.
    c. Look hard to see if plan stalling operation like sorts on subqueries can be eliminated.
  2. Split the query into smaller data increments.
  3. Try different join orders. – AWS Athena, Query exhausted resources at scale factor