Oracle Data Integrator (ODI),

Home » Oracle Data Integrator (ODI) » Using subqueries in Oracle Data Integrator (ODI) interfaces for complex data integration requirements

Using subqueries in Oracle Data Integrator (ODI) interfaces for complex data integration requirements

by Uli Bethke

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.

Published on January 7, 2010
Updated on December 18, 2024

Get our e-books Discover the Oracle Data Integrator 11g Repository Data Model and Oracle Data Integrator Snippets and Recipes
Oracle Data Integrator does not have any built in functionality for subqueries in interfaces. This is one of the reasons why you will find people claiming that ODI has shortcomings when dealing with complex transformations.

Your subscription could not be saved. Please try again.

You're In! Welcome to FastForward Congratulations on successfully subscribing to the FastForward Data Engineering Newsletter! You're now part of a growing community of 15,000+ data engineers who are staying ahead in the ever-evolving world of data.

FlowForward.

All Things Data Engineering
Straight to Your Inbox!

Shortcomings of existing workarounds

In the ODI community various workarounds have been suggested to address this. The following discussion thread on OTN summarises these efforts.
1) Use of WHERE EXISTS in the predicate
2) Use of a View that contains the complex subquery transformation logic as the source datastore.
3) Use of a series of temporary (yellow) interfaces.
While all of these are valid workarounds each of the above approaches has drawbacks of its own.
The use of WHERE EXISTS only addresses a small subset of requirements. Using views has one big disadvantage. You will lose data lineage. By using views you will not be able to trace mappings from source to target and as a result lose one of the advantages that an ETL tool offers over scripting. In theory performance should not suffer as Oracle uses predicate pushing and view merging. Tom Kyte has written an excellent article about this for Oracle Magazine a while back if you require more information (look for the section on Views and Merging/Pushing at the bottom of the article). The third option is to use a series of temporary interfaces. However, this has the disadvantage of physically setting down and materialising the resultset for each subquery, rather than processing everything in memory. Depending on the circumstances, you may experience degradation in performance.

Proposed workaround

I propose another workaround here that addresses all of the above shortcomings. We will use a combination of a temp (yellow) interface and a custom knowledge module to address the issues. Each subquery in our complex transformation requirement will be assigned to a temporary interface. I have written a custom integration knowledge module that will store the SQL of the subquery for each temporary interface in a table. We then stick all of our temp interfaces with the subqueries into a package and then let the knowledge module combine the individual subqueries into one complex query before loading the data into our target table. Take the following as an example. We have a requirement to rank the Top 10 purchasing customers and insert them into a target table. In SQL this can be done by using the rank() analytic function in a subquery and then filtering on this subquery. In our solution this will translate into two interfaces. One temp interface for the subquery with the analytic function. Our custom KM will store the SQL for this subquery in a table. We will then need a second (non-temp) interface that filters on the top 10 purchasing customers, combines the two queries, executes the resulting complex query, and inserts the resultset into the target table.

This is is just a prototype at the moment to demonstrate that it is possible to execute complex transformations in ODI. At the moment the solution is specific to Oracle. I intend to rewrite parts of the knowledge module using Jython arrays to store the subqueries over the next couple of weeks to make the solution technology agnostic.

The subquery knowledge module

Before I give a working example I want to give a brief overview on the custom knowledge module that is our work horse.
Step 1. Create table that stores the SQL for the subqueries.
Nothing spectacular here. We create a table that will hold the SQL for our subqueries. The data type for the sql query column is a CLOB (this will become relevant in a later step). We also have a session number column to store the ODI session number for each subquery. This will allow us to concurrently use the knowledge module.

CREATE TABLE SQL_QUERY

(

QUERY CLOB,

IDENTIFIER VARCHAR2(30 BYTE),

SESS_NO NUMBER,

CREATE_DATE TIMESTAMP(6)

)

Step 2: Store SQL pieces for subquery in Jython variables
In step two we store the SQL pieces for the subquery in Jython variables (note Jython as the technology). We retrieve the individual SQL components via the ODI substitution API.

v_collist = """<%=snpRef.getColList("", "t[EXPRESSION]", ",n", "", "")%>"""

v_query = """select <%=snpRef.getPop("DISTINCT_ROWS")%>

<%=snpRef.getColList("", "t[EXPRESSION] AS [CX_COL_NAME]", ",n", "", "")%>

from <%=odiRef.getSrcTablesList("", "([TABLE_NAME])", ", ", "")%>

where (1=1)

<%=snpRef.getJoin()%>

<%=snpRef.getFilter()%>

<%=snpRef.getJrnFilter() %>%s"""

if v_collist.find('OVER') == -1:

v_group_by = """<%=snpRef.getGrpBy()%><%=snpRef.getHaving()%>"""

else:

v_group_by =""

v_query = v_query % v_group_by

Make sure that you use the tab key for indentation. This seems to be a peculiarity of Jython.

Note also that I have built in a simple check for the presence of analytic functions into the code. If we find an analytic function in the source to target column mappings we do not generate the GROUP BY SQL piece as otherwise an error will be thrown. This simple check is missing from all of the out of the box knowledge modules. As a result analytic functions such as SUM() OVER, MAX() OVER etc. can not be used in interfaces. I consider this a bug in the getGrpBy() API substitution method in ODI.

There is a clever workaround for the above described in Metalink note 807527.1

Step 3: Debug subquery
In this step we build a debugging mechanism into the knowledge module. We use the Jython raise function to throw an exception and print out the value of the v_query variable to the Operator. This will allow us to quickly review the value of the generated SQL in the Operator module and if necessary allow us to debug it.

Note that you have to select the Ignore Errors checkbox. Otherwise execution of the interface will terminate at this step (after all we are forcefully raising an error).
Step 4: Store SQL for subquery in table
In this step we will use the Jython technology to store the SQL for the subquery together with the ODI session number in a table. We use a Jython stored procedure call to accomplish this.

import java.sql as sql

import java.lang as lang

import java.sql.Types as types

myCon = snpRef.getJDBCConnection("SRC")

try:

inVar1 = v_query

inVar2 = '<%=odiRef.getTargetTable("TABLE_NAME")%>'

inVar3 = <%=odiRef.getSession("SESS_NO")%>

myStmt = myCon.prepareCall('begin LOAD_SQL_QUERY (?,?,?); end;');

## Bind parameters (IN parameters)

myStmt.setString(1, inVar1);

myStmt.setString(2, inVar2);

myStmt.setInt(3, inVar3);

## Execute the callable statement

myStmt.execute();

finally:

inVar1 = 'myVar1';

Note that you have to use indentation using your tab key exactly as in the screenshot as otherwise an error will be thrown. Have a look at 424207.1 for more information on calling stored procedures from Jython.
Code for the stored procedure load_sql_query is as follows:

CREATE OR REPLACE PROCEDURE load_sql_query (p_query_value IN CLOB,p_identifier IN VARCHAR2,p_sess_no IN NUMBER)

BEGIN

INSERT INTO sql_query VALUES (REPLACE(p_query_value,CHR(10)||CHR(10),''),p_identifier,p_sess_no,SYSTIMESTAMP);

COMMIT;

END;

Step 5: Create complex query from individual subqueries
We only execute this step once we have run all of our subqueries. This step takes all of the subqueries and creates our final complex query. Condition for executing this step is that we are dealing with a non-temporary table that is a normal (non-temp) interface.
Command on Source

1	SELECT NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type FROM dual

The getTargetTable method returns ‘T’ for a non temp target table and NULL for a temp target table.
Command on Target

BEGIN

-- INSERT INTO target_table VALUES ('#table_type', <%=odiRef.getSession("SESS_NO")%>);

--COMMIT;

-- Only execute if target table is not of type temporary

IF '#table_type' = 'T' THEN

sql_substitute (<%=odiRef.getSession("SESS_NO")%>);

END IF;

END;

We only execute this step if the variable table_type is of type non-temp, that is equals T.
If we are dealing with a non-temp table and this is our final step in our complex transformation we generate the final complex query from the underlying subqueries by executing a stored procedure.

CREATE OR REPLACE PROCEDURE sql_substitute (p_sess_no IN NUMBER)

-- exec target_sql;

final_raw_query CLOB;

CURSOR c_iterator_outer IS

SELECT

query,

identifier,

create_date

FROM

sql_query

WHERE

sess_no = p_sess_no

ORDER BY

create_date DESC;

CURSOR c_iterator_inner IS

SELECT

query,

identifier,

create_date

FROM

sql_query

WHERE

sess_no = p_sess_no

ORDER BY

create_date;

BEGIN

SELECT query INTO final_raw_query from sql_query where create_date = ( select MAX(create_date) FROM sql_query);

FOR r_iterator_outer IN c_iterator_outer

LOOP

dbms_output.put_line ('outer query:' || r_iterator_outer.query || ' identifier:' || r_iterator_outer.identifier);

FOR r_iterator_inner IN c_iterator_inner

LOOP

final_raw_query := REPLACE(r_iterator_inner.query,'('||r_iterator_outer.identifier||')','('||r_iterator_outer.query||')' || ' ' || r_iterator_outer.identifier);

UPDATE sql_query SET query = final_raw_query WHERE create_date = r_iterator_inner.create_date;

dbms_output.put_line ('inner query' || final_raw_query);

COMMIT;

END LOOP;

END;

There may be a more elegant way of doing this but this seems to work for the moment. Originally I had thought of using the model clause for this but then noticed that this does not support CLOBs.
Step 6: Execute complex query and insert transformed data into target table
In this step we grab the transformed and substituted complex query from our table and execute it.
Command on Source

SELECT

NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type,

-- Metalink 423909.1 Edit odiparams or use jdbc driver 5 or 6

query AS query

FROM

sql_query

WHERE

sess_no = <%=odiRef.getSession("SESS_NO")%> AND

create_date = ( SELECT MAX(create_date) FROM sql_query WHERE sess_no = <%=odiRef.getSession("SESS_NO")%>)

In the Command on Source we retrieve the complex SQL query from our database table.
As we are dealing with a CLOB here we first need to make some changes to the odiparams.bat file. Alternatively you can use version 5 or 6 of the JDBC driver (Metalink 423909.1). I have outlined in a previous post how to achieve this.
If we don’t want to install a new JDBC driver we need to add the following line to the odiparams.bat
set ODI_ADDITIONAL_JAVA_OPTIONS= “-Doracledatabasemetadata.get_lob_precision=false”
You will need to restart the Designer module for this to take effect.
Command on Target

BEGIN

IF '#table_type' = 'T' THEN

INSERT INTO <%=odiRef.getTable("L","TARG_NAME","A")%>

(<%=odiRef.getColList("", "[COL_NAME]", ",nt", "", "((INS and !TRG) and REW)")%>

<%=odiRef.getColList(",", "[COL_NAME]", ",nt", "", "((INS and TRG) and REW)")%> )

#query;

COMMIT;

END IF;

END;

Note that we need to select the Ignore Errors checkbox. Otherwise the KM will fail at this step for the subqueries stage. A better approach for this may be to use Jython and pass the #query variable into a stored procedure that executes it as dynamic SQL.
Currently I use an INSERT statement for loading the target table. In a future version of this KM I also intend to add a MERGE statement as an option.
Step 7: Write complex query to Operator module
In a final step we write the complex query to the Operator. We can then easily debug any issues with it.
Command on Source

SELECT

NVL('<%=odiRef.getTargetTable("TABLE_TYPE")%>','Yellow') AS table_type,

-- Metalink 423909. Edit odiparams or use jdbc driver 5 or 6

TO_CHAR(query) AS query

FROM

sql_query

WHERE

sess_no = <%=odiRef.getSession("SESS_NO")%> AND

create_date = ( SELECT MAX(create_date) FROM sql_query WHERE sess_no = <%=odiRef.getSession("SESS_NO")%>)

Command on Target

1	raise("""INSERT INTO <%=odiRef.getTable("L","TARG_NAME","A")%>(<%=odiRef.getColList("", "[COL_NAME]", ",nt", "", "((INS and !TRG) and REW)")%> <%=odiRef.getColList(",", "[COL_NAME]", ",nt", "", "((INS and TRG) and REW)")%> ) """ + """#query""")

The subquery knowledge module in action – Two examples
Let’s now have a look at our new Knowledge Module in action. For this purpose we will use tables from the SH sample schema.

Example 1: Subquery with analytic function

In this example we want to retrieve the date on which the amount sold for a customer was the greatest. One way of achieving this is to use the ROW_NUMBER() analytic function in a subquery to rank the amount sold and then filter the Top 1 record from this subquery. So the query would look similar to the following

SELECT

CUST_ID AS CUST_ID,

TIME_ID AS TIME_ID,

AMOUNT_SOLD AS AMOUNT_SOLD,

row_number() OVER (PARTITION BY CUST_ID ORDER BY AMOUNT_SOLD DESC, TIME_ID DESC) AS RN

FROM

sales

) SQ1

WHERE SQ1.rn = 1

We will now generate this query with our new knowledge module.
First of all we will generate a temporary interface for the subquery with the analytic function in it.

Select the Staging Area Different From Target checkbox and from the dropdown select the schema you want to execute this in.
In the Target Datastore click on Untitled. Type SQ1 as the Name for the target datastore and select the Data schema radiobutton.

Drag the sales table from the SH model to the Sources area (you first need to reverse engineer this from the SH schema).

Drag and drop the columns CUST_ID, TIME_ID and AMOUNT_SOLD from the source datastore to the target datastore.

Right click inside the Target Datastore area and select Add a column

Name the new column rn and type in the value for the analytic function:
row_number() OVER (PARTITION BY SALES.CUST_ID ORDER BY SALES.AMOUNT_SOLD DESC, SALES.TIME_ID DESC)

Go to the Flow tab and select the custom subquery knowledge module.

That’s it, we have created the yellow interface for our subquery.

In a next step we now need to create the interface that takes the sales_amount yellow interface as its source and loads the target table.
Create an interface, name it sales_amount_filter, drag and drop the sales_amount interface to the Sources area, and create a filter as per screenshot below.

We need to create a table for our target datastore. Use the script below to create this table, reverse engineer it and drag and drop it to the target datastore area.

CREATE TABLE SALES_AMOUNT

(

CUST_ID NUMBER,

TIME_ID DATE,

AMOUNT_SOLD NUMBER

);

Next move on to the Flow tab and select our custom subquery knowledge module.
We now need to bring the two interfaces together in a package and then we are ready to load our target.

Execute the package and then switch to the Operator module. You should get something similar to the screenshot below.

Let’s have a look at the execution of the individual steps. We’re particularly interested in the subqueries and the final complex query. So let’s have a look at the debug steps where these will show up.
Below we see the query for the temp interface that contains the analytic function.

Next we’ll have a look at the query that is generated for the second (non-temp) interface.

As you can see from the figure above, the query selects from the SQ1 target table of the temporary interface.
We substitute SQ1 in step 12 of the package with the SQL query of the analytic function and then use this complex query to insert into the target table.

Example 2: Joining two subqueries

In the next example we take this a step further and join another subquery to the package from example 1.
We first create another subquery using a temp interface. As you can see from the figure below we aggregate the amount_sold by cust_id.

We then take this temp interface and the temp interface from example 1 and join these together on the cust_id in yet another interface.

The script for the target table sum_sales_amount is as follows

CREATE TABLE SUM_SALES_AMOUNT

(

CUST_ID NUMBER,

TIME_ID DATE,

AMOUNT_SOLD NUMBER,

SUM_AMOUNT_SOLD NUMBER

)

Next we stick the three interfaces into a package.

… and execute it.
From the Operator module we can see the generated SQL.

Conclusion

In this article I have shown you how you can execute complex queries and subqueries in ODI without loss of data lineage and without suffering performance degradation. So far I have not used the knowledge module in a production environment. So I am sure there is still room for improvement. However, from now on I will use and further develop this approach to meet complex transformation requirements in ODI.
Any input highly welcome. Also let me know of bugs that you come across or if you find that any of the above is unclear.
In order to master scripting in ODI I recommend the following books.
Java BeanShell
Scripting in Java: Languages, Frameworks, and Patterns
Jython
The Definitive Guide to Jython: Python for the Java Platform.
Jython Essentials (O’Reilly Scripting)

About the author:

Uli Bethke

Co-founder of Sonra

Any questions or comments for Uli? Connect with him on LinkedIn.

Follow Uli Bethke:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

Using subqueries in Oracle Data Integrator (ODI) interfaces for complex data integration requirements

FlowForward.

Shortcomings of existing workarounds

Proposed workaround

The subquery knowledge module

Example 1: Subquery with analytic function

Example 2: Joining two subqueries

Conclusion

Uli Bethke

Using subqueries in Oracle Data Integrator (ODI) interfaces for complex data integration requirements

FlowForward.

Shortcomings of existing workarounds

Proposed workaround

The subquery knowledge module

Example 1: Subquery with analytic function

Example 2: Joining two subqueries

Conclusion

Related Articles

Cookies consent