Subselect, derived tables, and subqueries in ODI 11G
Get our e-books Discover the Oracle Data Integrator 11g Repository Data Model and Oracle Data Integrator Snippets and Recipes
I think one of the best new features in ODI 11G is the subselect/subquery feature. In ODI 10 this could only be achieved by a workaround as outlined in a previous post Using subqueries in Oracle Data Integrator.
What is the advantage of a subquery?
In previous versions of ODI we had to physically set down the data for each indiviudal query, e.g. if we wanted to rank a dataset and then select the top ten out of that dataset we had to create an interface for the ranking operation and an interface for the top ten operation. At each point we had to set down the data thereby increasing I/O and decreasing overall performance. In ODI 11 this has changed. While we still need to create two temp interfaces for the operation as a whole we can now tell ODI to treat the first temp interface as a derived table. ODI 11G will then use this to generate a subquery.
How does the subquery work in ODI 11?
The way this works is fairly simple. For each subquery/derived table in your query you create a temp interface. You embed your various subqueries by simply telling ODI that you want to use the temp interface as a subquery.
[big_data_promotion]
A step by step guide
We will look at an example from the SH schema. The task at hand is to load a table with the top ten customers based on sales amount.
To accomplish this task we need three temp interfaces.
The first temp interface (INT_SALES_CUST) will aggregate the sales amount from the sales table by cust_id
The second temp interface (INT_SALES_RANK) will take the resultset from the interface in the previous step and dense rank the customers’ sales data.
The third interface (INT_TOP_TEN) will then select the top ten customers, join to the customers table, and physically set down the data.
So far so good. These are the same steps we took in ODI 10.
In a next step we need to subquery enable the interfaces. We open interface INT_SALES_RANK and click on the INT_SALES_CUST data store. In the Source Properties section you will find a checkbox Use Temporary Interface as Derived Table. Select this checkbox.
We also need to perform the same step for our third interface INT_TOP_TEN.
Once this has been done we can execute interface INT_TOP_TEN in Simulation mode.
This will generate the required query with the embedded subqueries from the temp interfaces. What a great feature.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
INSERT INTO sh.cust_top_ten (sales_rank, cust_first_name, cust_id, sales_amt, cust_last_name) SELECT sales_rank, cust_first_name, cust_id, sales_amt, cust_last_name FROM (SELECT sales_rank.sales_rank sales_rank, customers.cust_first_name cust_first_name, customers.cust_id cust_id, sales_rank.sales_amt sales_amt, customers.cust_last_name cust_last_name FROM (SELECT sales_cust.cust_id cust_id, sales_cust.sales_amt sales_amt , Dense_rank() over (ORDER BY sales_cust.sales_amt DESC) sales_rank FROM (SELECT sales.cust_id cust_id, SUM (sales.amount_sold) sales_amt FROM sh.sales sales WHERE ( 1 = 1 ) GROUP BY sales.cust_id) sales_cust WHERE ( 1 = 1 )) sales_rank, sh.customers customers WHERE ( 1 = 1 ) AND ( sales_rank.cust_id = customers.cust_id ) AND ( sales_rank.sales_rank <= 10 )) odi_get_from |