Tricks with SQL: Beware of the Predicate in the Outer Join
Today we will have a look what happens when we place a filter condition into an outer join. We will also have a look at how this compares to placing the filter condition into the WHERE clause. Finally we’ll have a look where this could be useful.
Let’s first create some sample data.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
create table customer ( cust_id number, cust_desc varchar2(50) ); ALTER TABLE customer add CONSTRAINT pk_customer PRIMARY KEY (cust_id); create table customer_loc ( customer_loc_id NUMBER, customer_id NUMBER, customer_loc_desc VARCHAR2(50) ); ALTER TABLE customer_loc add CONSTRAINT pk_customer_loc PRIMARY KEY (customer_loc_id); insert into customer values (1,'Gold'); insert into customer values (2,'Gold'); insert into customer values (3,'Silver'); insert into customer_loc values (1,1,'Dublin'); insert into customer_loc values (2,2,'Paris'); insert into customer_loc values (3,4,'Berlin'); commit; analyze table customer compute statistics; analyze table customer_loc compute statistics; |
Let’s run our first outer join query where we put the filter condition into the WHERE clause. All is as expected. The query just returns customer_id 1.
Now let’s run the same query, but we put the filter condition into the join. This may be different from what you may have expected. What happens is that the query returns all of the rows in the customer table and those rows of table customer_loc where the join condition is met, i.e. those rows where customer_id = 1.
Let’s verify what we have just seen with another query. This time we will put customer_id = 4 into the Join condition. There are no rows in table customer_loc that match this. As expected the query returns all rows for table customer but now rows for table customer_loc
What could this be useful for? One use case would be some limited form of data densification whereby you need to return all of the tables in one table but only a subset of rows in another table. Typically this can only be done using some sort of subselect. An example:
The query below is run in the SH schema and returns all of the customers, but only those sales transactions that are larger than 1000 in sales_amount.
1 2 3 |
select a.cust_id,amount_sold from customers a left outer join sales b on a.cust_id = b.cust_id and amount_sold > 1000; |