SQL,

Bad SQL. SQL anti patterns in OUTER joins

Uli Bethke

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.


Published on August 29, 2023
Updated on November 20, 2024

You often need to find both matched and unmatched rows between two or more tables. As we all know we use OUTER joins in SQL to achieve this.

However, there are many traps that inexperienced SQL developers fall into when working with OUTER joins. Some of these traps can be quite tricky and will lead to unintended, unexpected, or plain wrong results.

This blog post will show you how to bypass these dangerous mines and guide you safely through the SQL OUTER JOINs minefield.

Bad SQL! Don’t use INNER JOIN after OUTER JOIN.

The following query gets a list of unique customer names and unique stock items they have ordered.

Result

This gives us 13 rows.

In the query, all joins are INNER JOINs, which means only customers with orders and corresponding order details and stock items are listed. If you’ve noticed a couple of customers missing from the result, it’s likely because they haven’t placed any orders.

So how do we identify customers who did not place orders? Exactly! We use an OUTER join.

Some developers might try the following query and are surprised that they get the same result as for the previous query with INNER joins.

Result:

This is super bad SQL. Why? It returns results that are unintended. The intention was to return all customers including those without orders. However, we only got customers with orders. Our query returned the wrong results or at least not the result we expected.

The LEFT OUTER JOIN between Customers and Orders does initially preserve any customers without any orders. But, as we then inner join ORDERS with ORDERLINES the rows with NULL values are removed. The INNER JOIN between ORDERS and ORDERLINES overrides the OUTER JOIN and implicitly converts it to an INNER JOIN.

Let’s see what happens when we convert all of our joins to LEFT OUTER joins

Result:

This gives us 20 rows.

While we now get all of the customers without orders we also get all of the orders without order lines and stock items. In other words we also preserved orders with no order lines and stock items. This may also be unintended.

What if we only want customers without orders but don’t want to show orders without orderlines at the same time?

The following query will work for this scenario.

Result:

We now get 15 rows. All of the customers without orders but ignoring orders without orderlines.

Some people prefer LEFT JOINs over RIGHT JOINs. We can rewrite the query as a LEFT JOIN with a subquery.

Result:

We get 15 rows. The same result as the query with the RIGHT OUTER JOIN.

Using FlowHigh UI to detect the anti pattern

As a DBA it is important that you detect anti patterns in SQL that lead to incorrect results.. We have created a module in FlowHigh to detect these and dozens of other anti patterns.

In this section, we’ll guide you through the process of using the FlowHigh UI to detect the anti pattern we covered earlier on.

Gain access to FlowHigh today.

Follow these steps:

  • Access FlowHigh: Log in to FlowHigh’s web interface and select the “optimise sql” option from the left panel.
  • Submit the Query: Paste the SQL query into the editor and click “Optimise”.

FlowHigh’s analysis will identify instances of the anti pattern. If multiple anti patterns are detected in the same query, FlowHigh will highlight them individually.

We can get the details of each antipattern by clicking the highlighted number as shown in the figure above. Flowhigh will show the details of the anti pattern

Flowhigh will also offer a solution to this discovered antipattern.

Using FlowHigh SDK to detect the anti pattern

The SQL that inner joins a table after an OUTER join affects the correctness of the result set. Hence you want to closely monitor distinct in SQL query history of your database for this particular anti pattern.

The FlowHigh SDK is very useful to programmatically detect anti patterns in your SQL code for automation purposes. As an example we can provide the sql discussed in this post as an input to FH and detect the bad SQL. We will use the below query:

Follow these steps to use the FlowHigh SDK:

  • Import the SDK: Begin by importing the required modules from the FlowHigh SDK in your Python script.
  • Instantiate FlowHighSubmissionClass: Create an instance of the FlowHighSubmissionClass by passing your SQL query to the constructor:
  • Retrieve Anti Patterns: Utilize the instantiated object to extract all anti pattern details present in your SQL code. By using the get_all_antipatterns() method, you’ll obtain a list of anti pattern instances detected within the query.

Output:

AP_IDAP_NAMESEVERITYPERFORMANCE_FLAGREADABILITY_FLAGCORRECTNESS_FLAG
AP_13Avoid inner join after outer join in multi-join queryWarningNNY
AP_06Beware of SELECT DISTINCTWarningYNY

To fully automate the detection of SQL anti patterns you can use the FlowHigh SDK to scan the query history of your database.

Register to gain access to the FlowHigh

Get Access

Uli Bethke

About the author:

Uli Bethke

Co-founder of Sonra

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.