Archives

Tagged ‘sql‘

Tricks with SQL: Beware of the Predicate in the Outer Join

Today we will have a look what happens when we place a filter condition into an outer join. We will also have a look at how this compares to placing the filter condition into the WHERE clause. Finally we’ll have a look where this could be useful.

Let’s first create some sample data.

create table customer (
cust_id number,
cust_desc varchar2(50)
);

ALTER TABLE customer
add CONSTRAINT pk_customer PRIMARY KEY (cust_id);

create table customer_loc (
customer_loc_id NUMBER,
customer_id NUMBER,
customer_loc_desc VARCHAR2(50)
);

ALTER TABLE customer_loc
add CONSTRAINT pk_customer_loc PRIMARY KEY (customer_loc_id);

insert into customer values (1,'Gold');
insert into customer values (2,'Gold');
insert into customer values (3,'Silver');

insert into customer_loc values (1,1,'Dublin');
insert into customer_loc values (2,2,'Paris');
insert into customer_loc values (3,4,'Berlin');

commit;

analyze table customer compute statistics;
analyze table customer_loc compute statistics;

Let’s run our first outer join query where we put the filter condition into the WHERE clause. All is as expected. The query just returns customer_id 1.

Now let’s run the same query, but we put the filter condition into the join. This may be different from what you may have expected. What happens is that the query returns all of the rows in the customer table and those rows of table customer_loc where the join condition is met, i.e. those rows where customer_id = 1.

Let’s verify what we have just seen with another query. This time we will put customer_id = 4 into the Join condition. There are no rows in table customer_loc that match this. As expected the query returns all rows for table customer but now rows for table customer_loc

What could this be useful for? One use case would be some limited form of data densification whereby you need to return all of the tables in one table but only a subset of rows in another table. Typically this can only be done using some sort of subselect. An example:

The query below is run in the SH schema and returns all of the customers, but only those sales transactions that are larger than 1000 in sales_amount.

select a.cust_id,amount_sold
from customers a 
left outer join sales b on a.cust_id = b.cust_id and amount_sold > 1000;

CSV to Rows

I recently needed to convert a comma separated list of values in a table column to a column where each value in the list gets its own row in a table.

So first of all let’s create such a table. I have limited the number of comma separated values that will be created to 12.

View Listing 1

To convert our csv list to rows we will create an in-memory helper table that for the maximum number of comma separated values has a corresponding number of rows. We can then use this helper table to join to our main table and then parse the values.

The SQL below will create our helper table:

View Listing 2

Results for the helper table will look like:

Now it is simply a matter of joining our main table to this helper table and parsing the results:

View Listing 3

In order to improve performance we could create a function based index for our join column.

There are other solutions that will do the same thing and requires less code, but will not perform as well. So if you have a small set of rows to convert you can use this solution: How do I split comma delimited data in a SQL Statement