Archives

Oracle Data Integrator (ODI)

REAL TIME BI PODCAST ON ORACLE DATA INTEGRATOR 12C. Part II.

In the second part of the series we cover:

More discussion on ODI vs Informatica
More on migrating from OWB to ODI
Using ODI outside the data warehouse (BI Apps)
ODI in the cloud
ODI and Big Data

)

Real Time BI Podcast on Oracle Data Integrator 12c. Part I.

I recently did a podcast with Stewart Bryson (Chief Innovation Officer RittmanMead), Kevin McGinley, and Alex Shlepakov (both Oracle Analytics at Accenture).

In the first part of this two part series we cover the following areas:

ODI 12c. What are the advantages? When should you upgrade?
Migration from OWB to ODI 12c. Should you migrate? How and when?
Comparison of ODI to Informatica and other ETL tools.
ETL style vs. ELT style data integration tools.
ODI, ETL, and data integration in the cloud.

ODI 11g Repository Book Out Now

The book is for free. All you need to do is contact us. We will then send you a PDF version of the book. All we ask you to do is to permanently link from your blog, website, Facebook etc. to our site.

Alternatively, you can buy the book on Amazon.com. You can get the Kindle edition for $9.99. Send us your Amazon receipt for the the PDF version, which scales the submodels better.

You may be asking yourself why we have written a reference book for the ODI repository? Why not use the excellent ODI SDK to query that information? While the SDK is great at automating certain repetitive tasks it is not so good at retrieving information (in bulk) from the repository. Most ODI developers are familiar with using SQL. Not too many of us have a background in Java programming.

Some of the use cases for this book

- Extract information from the Operator log for reporting
- Impact analysis, e.g. find all interfaces a datastore is used in.
- Identify objects for deployment
- Identify objects that have recently changed
- Error message extraction
- Automated code checklist

We have split the book into two sections. The first section covers the most important subject areas of the master repository, the second section covers the most important subject areas of the work repository. Each subject area comes with a submodel, a list of relevant tables and corresponding ODI screens, and most importantly one or more useful pieces of SQL code to query the subject area in question. We have also made the models of the subject areas available online on our website http://sonra.io/odi11rep.

The book is for free. All you need to do is contact us. All we ask you to do is to permanently link from your blog, website, Facebook etc. page.

Alternatively, you can buy the book on Amazon.com. You can get the Kindle edition for $9.99. Send us your Amazon receipt for the the PDF version, which scales the submodels better.

In the book we cover the following subject areas

Physical & Logical Architecture
Repository
Link Master and Work Repository
Solution
Version
Internal Tables
Data Store
Interface Mapping
Interface Flow
Interface Knowledge Module Options
Interface Model
Interface Clause (Where & Join)
Session (Operator Log)
Package
Model Hierarchy
Project Hierarchy

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site

Need ODI Training? Learn Oracle Data Integrator from the experts.

I am proud and delighted to announce that we are now offering ODI classes.

Learn from a combined 18+ years hands on ODI experience and get all these benefits.

Top Quality

Choose world class ODI training from THE ODI gurus. Don’t settle for less.

Value-Priced

Unrivaled in quality AND affordable at the same time.

Training world-wide

On-site or via our state of the art virtual classrooms.

FREE consulting

Get free consulting with our aftercare consulting package.

Tailor-made Training

Tell us your requirements and we will put together a custom course for you.

Mix and Match Modules

Mix and match standard modules. Combine them with custom modules.

For more information on course content and how to book your ODI course visit our ODI training page.

The Oracle Data Integrator 12c Masterstroke

Visual data flows and OWB to ODI 12c migration path

In their latest release of Oracle Data Integrator (ODI 12c), Oracle have addressed two main concerns of the analysts and some of their customers and partners. The first one is the unclear migration path from Oracle Warehouse Builder to Oracle Data Integrator. Another frequent concern was that the declarative design approach based on reusable code templates (knowledge modules) was not visual enough and scored badly in this category against other ETL tools.

People were rightly raising concerns that complex queries could not be implemented through the ODI user interface. I had always been wondering how Oracle would address those two issues. In what has to be labeled as a masterstroke of the ODI product and development teams they were able to kill two birds with one stone. And two big birds those are.

Note: As a side note I have always been wondering why ODI’s declarative design approach has not really made it on the analyst’s list of strengths for ODI. To me this metadata driven design approach that makes extreme reusability possible and significantly reduces development time is the core innovation and ultimate strength of Oracle Data Integrator.

Declarative Flow-Based User Interface

In ODI 12c, Oracle have introduced a feature that they call Declarative Flow-Based User Interface. Don’t be distracted by the name. What it means is that we can now create data flows (Interfaces were re-named to Mappings) that combine the two approaches: the visual approach of Mappings and Operators that we already know from Oracle Warehouse Builder and the powerful declarative design approach we love from Oracle Data Integrator.

With this new paradigm you essentially get the best of both worlds: ease of visual data flow development and the power of the declarative design approach. No other vendor can offer anything comparable. When would you combine the two approaches? For simple requirements I would recommend to stick to the traditional ODI approach. However, when you have complex requirements, e.g. you need to pre-aggregate data or filter a recordset based on the application of an analytic function then the new feature comes in handy.

At the moment we don’t have all of the Operators available that we know from OWB, e.g. I am missing the Pivot and Unpivot Operators, which means that complex queries requiring these can still only be generated using workarounds such as Views or similar. What would be handy is an Operator SDK that would allow to create GUI representations of database specific SQL dialects, e.g. I love subquery factoring and it would be handy to have an Operator for this or one for recursive queries etc.

ODI12c

ODI 12c Mapping: Visual Design with new Operators

OWB to ODI 12c migration

The introduction of OWB style Mappings should also faciliate the migration from Warehouse Builder to ODI 12c. The migration tool is still in Beta. While it still remains to be seen how well the migration toll will work, ODI 12c for the time being ships with the OdiStartOwbJob tool, which allows to kick off OWB jobs through ODI and store execution results in the ODI Operator log. If you have an immediate requirement to migrate from OWB to ODI contact us for our advice.

Other new features

Another interesting feature in ODI 12c is the ability to run Mappings in parallel. In practice that means that each of the temporary tables gets its own unique name avoiding clashes. In the past you had to apply workarounds.

There have also been enhancement to the Knowledge Module editor. For those of you writing a lot of their own Knowledge Modules this is welcome news.

ODI 12c now also has much tighter integration with Oracle Golden Gate. A welcome introduction for those of you running near real time data integration projetcs (another concern of analysts and some customers).

ODI 12c Gaps

While ODI 11g has brought us the SDK to automate and script common tasks, ODI 12c now brings us visual Mappings giving us new options to create data flows. Oracle Data Integrator remains the leading ETL tool on the market. With ODI 12c it has extended this lead even further. How Gartner don’t rate it above DataStage or Powercenter in their Magic Quadrant is a mystery to me.

One or two weaknesses remain. ODI 12c does not yet integrate with  source control systems out of the box. There are also no automated deployment options. While this functionality can be scripted it takes significant effort to do so. I am currently in the process of testing such an in-house developed solution. If you would like our advice how source control integration and automated deployments can be achieved get in touch.

While ODI 11g brought us improvements in the way we schedule and orchestrate data flows I am a strong advocate of dependency driven data flow execution. It is so much simpler and more efficient to hard coding of data orchestration routines. An enterprise solution with several thousand data flows will not work smoothly without a dependency driven scheduler. If you would like to find out more about our solution get in touch. You can also download our presentation ODI Scheduler- Source Control – performance on both source control and dependency driven scheduling.

Resources

The ODI 12c Developer Guide: http://docs.oracle.com/middleware/1212/odi/ODIDG/index.html

New Features of ODI 12c: http://docs.oracle.com/middleware/1212/odi/ODIDG/whatsnew.htm#sthref3

Download Oracle Data Integrator 12c: http://www.oracle.com/technetwork/middleware/data-integrator/downloads/index.htm

Install tip ODI Studio: https://blogs.oracle.com/dataintegration/entry/odi_12c_installing_odi_studio

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site

ODI 11g Cookbook – The leading ETL tool now also has the best ETL book

I don’t know too many books or movies where the sequel is better than the original. The ODI 11g cookbook is such a rare case. It is stuffed with 60 valuable recipes that every ODI developer should know. Apart from the value of these recipes in their own right they also showcase the flexibility of ODI and can be transfered to other problems.

My favourite chapters are; Knowledge Module Internals, Advanced Coding Techniques, Advanced Topology, and Using Variables.

A big thank you to the ODI product management team for sharing their insights.

Using ODI user functions to dynamically inject SQL into Interfaces

I’d like to share with you a recipe that demonstrates the power which the combination of ODI functions and Java BeanShell scripting techniques can provide.
Before I do so I will briefly describe the issue we recently had as a background for the use case.

Doesn’t matter which programming language or tool you use, it’s a widely known good practice not to hardcode static values into your code.
I’m talking here about various ‘configuration’ like values used within the code itself: thresholds, whitelists, blacklists, capex values etc..
During the code development phase those may very well seem static but later when product has gone live it is often required to adjust them which may even result in a hotfix case in the worst scenario.
To prevent it the values can be provided in many alternative ways including registry, ini and XML files or just anything sourced from network connection.
The ODI is no exception here and the generic purpose parameter table seem like the natural way to handle such situations.

The problem may occur when such value is to be used in an ODI interface directly.
In the case we had the large table had to be filtered using such metadata stored parameter.
One way to do this would be to pull in that table directly into the interface.
It could be then used in the join:


Unfortunately, as would really expect, the join condition:

DS.DAY_DATE>to_date(P.VALUE,‘YYYY-MM-DD’) AND KEY=‘LAST_PROGRESS_DATE’

doesn’t make a great query execution plan:


What we can alternatively do is to pass it as a regular filter in the SUBQUERY form:


DS.DAY_DATE > (SELECT to_date(value,‘YYYY-MM-DD’) FROM EDW.C_PARAMETER m WHERE KEY=‘LAST_PROGRESS_DATE’)

It doesn’t improve it a bit. Still the INDEX RANGE SCAN is performed.
Both queries take between 40minutes to 1hour.

I can almost hear you shouting “Why don’t you use an ODI variable for it?” and you’re damn right, this would solve the problem.
When static value is used the query filter

DS.DAY_DATE > #V_LAST_PROGRESS_DATE

changes the plan to simply do the FULL TABLE scan which it should do in the first place;


Using an ODI variable however has also some downsides.
Each parameter introduces an overhead of dedicated ODI variable. This in practice translates to wrapping interface in the Package with separate declaration and separate refresh step.
Very often such parameters would use the same source for it but it still requires separate refresh definition. What I have in mind here is the elegant solution seen in OBIEE where variables are just different columns which can then share the same execution block.
Another thing is that with the complex logic the number of parameters usually grows fast which simply makes it difficult to manage.
Last but not least is the fact that it makes debugging difficult. Variables by default are not seen in the operator.

This can make it even harder to find out what’s happened on Production at some point in time when log is the only thing you left with.

I said ‘by default’ as there are ways to achive it. Prior to ODI version 11.1.1.6 the variable could be only printed using tricks like throwing and ignoring exceptions or switching history for variable, neither one elegant. From that version onwards there’s variable tracking feature but again as this requires running code with log level higher than 6 something not enabled by default, especially in production environment.

It turns out that with a bit of Java BeanShell scripting such parameter values can be sourced in a run-time from the database and substituted in the right place just before the query runs.

To make the code reusable we will wrap the Java BeanShell code into the function with the syntax defined as

DS.DAY_DATE > #V_LAST_PROGRESS_DATE

The first parameter $(SQL) would just specify a full text of the query to return the parameter value. This is the equivalent of ODI Variable refresh SQL text.
The second one, $(AT) would specify which connection that refresh statement should be executed at. It should only take the values either “DEST”, “SRC” or “WORKREP”. Yes, it’s possible to get runtime repository values too!
The “SRC” might be very useful when “Command on Source/Command on Target” is used or simply when some different than Target schema is to be used for getting parameter value.

The implementation goes as following:

<?
java.sql.Connection targConnection = odiRef.getJDBCConnection(“$(AT)”);
java.sql.Statement s = targConnection.createStatement();

String
 query=“$(SQL)”;

java.sql.ResultSet rs = s.executeQuery(query);
if (rs.next())
     out.println(rs.getString(1));
else
     throw new Exception(“Query “+query+” in UDF failed.”);
s.close();
?>

Now the name for our function. As you can see from the screenshot above, it is different from the one in the syntax.
I didn’t mention it before but this is where another trick comes in.
We can actually make the second function parameter optional.

While this is directly possible in many programming languages it’s not in ODI and what we need to do here instead is to leverage another paradigm taken from programming world called function overloading.
This will be achieved by creating another function that takes one parameter and executes the original one passing it and hardcoding the second one.
The name used in the syntax for both will be the same but the syntax itself will be obviously different:

 

ODI function name

ODI function syntax

implementation

SUBST_QUERY_RESULT_VAL_AT SUBST_QUERY_RESULT_VAL($(SQL),$(AT)) does the heavy lifting
SUBST_QUERY_RESULT_VAL SUBST_QUERY_RESULT_VAL($(SQL)) executes the function above:SUBST_QUERY_RESULT_VAL($(SQL),DEST)
 

The second’s function implementation simply calls the first one transparently passing the same SQL and defaulting the $(AT) parameter with DEST value:

SUBST_QUERY_RESULT_VAL($(SQL),DEST)

Thanks to it, the user can specify the second parameter or skip it whereby the ‘DEST’ would be used instead.
ODI will analyze the function call format used in the Interface and based on that will substitute relevant function.

I should also mention here there’s an ODI bug that may prevent from linking to the  right function.
I discovered the order in which functions are created plays a role here (values of internal ODI Object IDs?).

This means if we create the the ‘wrapper’ one first, ODI will assume there’s a recurrence call even within the function even when the call doesn’t match the function syntax. It can be easily diagnosed by expanding and examining ‘uses’ contents under the function itself. The following indicates wrong recurrence:

If you created the functions in the order as described in this post you should see the calls are resolved properly:

Coming back to our use case, those functions then allow using the SQL queries directly in the interface. The SQL filter below:

DS.DAY_DATE > SUBST_QUERY_RESULT_VAL(SELECT ‘date”’||value||”” FROM C_PARAMETER WHERE KEY=‘LAST_PROGRESS_DATE’)

will be substituted nicely with

DS.DAY_DATE > date’2013-01-13′

which in brings down the query time to less than 3 minutes.

It’s not everything, the source schema for parameter table doesn’t need to be hardcoded either!
A function with nested substitution API call would work in the same way:

DS.DAY_DATE > SUBST_QUERY_RESULT_VAL(SELECT ‘date”’||value||”” FROM ”+odiRef.getObjectName(“C_PARAMETER”)+” WHERE KEY=‘LAST_PROGRESS_DATE’)

Please note that this ODI function call has one limitation though – The SQL parameter has to be all expressed within one line, doesn’t matter how long.
This however, with yet another trick can be lifted too.
You can learn about it and plenty of other useful ODI scripting techniques at our ODI training classes.

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site

Extreme re-usability in ODI 11g (I can’t believe I am giving away this trick).

Another Christmas and a second baby under my belt it’s time to get back to blogging.

There were recently some good posts by David Allan and Gurcan Orhan on the power of ODI functions. David mentions correctly that ODI functions are the most underrated feature in ODI. Whenever you think of re-usability in ODI think of user functions. Functions can be used anywhere in ODI where you need to write something once and apply it many times. This is not limited to using functions inside interfaces to load columns or the use of wrapping and parameterizing common functionality. You could just write a snippet of SQL, e.g. a commonly used filter in a WHERE clause as a function and reuse it many times. The nice thing about functions is that you can use substitution method API calls in a function, use Java or the ODI tools.

Use case for re-use of user functions

One of the things that have puzzled me about ODI and Knowledge Modules is that some of the steps in the KMs are repeated and re-used over and over again without some central placeholder. Why not write the step once and re-use it many times across the Knowledge Modules? This is exactly what I will show you. We will take table stats gathering as an example. One of the steps in Knowledge Modules is to gather table stats once the target table has been loaded. Why not create a user function that gathers table stats on a target table and takes the estimate percent size as a parameter?

In a first step let’s create the function. As a recommendation I would suggest to prefix your functions with a convention. In a first implementation I had GATHER_TABLE_STATS for the syntax, which wreaked havoc with any KMs that are using the Oracle GATHER_TABLE_STATS procedure in package dbms_stats.

And the implementation

BEGIN

dbms_stats.gather_table_stats ( ownname => '<%=odiRef.getInfo( "DEST_SCHEMA" )%>', tabname => '<%=odiRef.getTargetTable("RES_NAME")%>', degree => DBMS_STATS.AUTO_DEGREE, estimate_percent => $(sample_size), cascade => TRUE  ) ;

END;

Next we will use the function as a step in a (global) Knowledge Module and pass in 10 as a parameter for the estimate_percent part of the stats gathering piece. In this particular case I have modified the IKM SQL Control Append.

Then we create and execute an interface that uses this IKM

As you can see the function was substituted at runtime and stats were gathered on the target table.

This is just one example where you can benefit from user function re-usability. Be creative and think out of the box and you will see user function written over everything.

How you can launch an ODI scenario through a web service call?

ODI and Jetty

The ODI 11g standalone agent now ships with its own lightweight application server (Jetty). The main reason this was included is to make it easier to execute scenarios via web service calls. In the past this was quite painful as you needed a separate application server (OC4J). The other limitation was that it was difficult (read work around) to implement asynchronous web service calls. Luckily, this has all changed with ODI 11g and Jetty. However, I don’t believe that you can run the ODI console in Jetty, which is unfortunate.

Asynchronous web service calls

The WSDL can be found at http://<standaloneagentname>:<port>/oraclediagent/OdiInvoke?wsdl

 

We can call the web service methods from any SOAP Client, e.g. the OdiInvokeWebService tool built into ODI.

We need to provide our login, the work repository for execution, the name of the scenario, its version, and the context. You can also specify if you want the scenario to be executed synchronously or asynchronously. If you set Synchronous to false then control will be handed back to the client immediately. If you set it to true it will execute and then return control. In asynchronous scenarios you can use the session ID from the response file and the getSessionStatus method to return the status of your scenario execution.

The other thing you have to be aware of is that when you launch a scenario through a web service call it will fork out a new agent based on your odiparams.bat. You need to plan for the extra memory used up.

You can now easily orchestrate your process flows in BEPL or launch them from OBIEE.

Need ODI Training? Learn ODI from the experts!

I am proud to announce that Sonra are now offering ODI classes.

Value-priced. Customised. Onsite-Offsite. Online.

Five day standard training course.
Online.
Starts 24 February 2014.
Book before 15 January 2014. Pay $949.

More information on our ODI courses.

Making use of ODI Flexfields to meet requirements

What are ODI Flexfields?

In ODI you can create user-defined fields on certain objects. You can think of these fields as additional attributes for certain objects. At design time you populate these attributes with values that are then used at runtime, e.g. by a Knowledge Module. There are various Flexfields defined out of the box for very specific requirements, e.g. there are Flexfields defined on the Datastore object for SAP and HIVE data integration tasks.

Where do you create them?

You create Flexfields in the Security module under the Objects accordion. You can’t create Flexfields for all of the objects. While you can create a Flexfield for an Interface you can’t create a Flexfield for an Interface Target Table.
Once you have created the Flexfield you can then populate it with values in Designer.
An example
You have a requirement to log errors to an error table using a CKM. Based on the severity of the error you want to allow records through to your target table. DQ checks that result in minor errors are logged in the error table and go through to the target. Records with more sever issues only go to the error table and do not end up in the target table.
In a first step we need to define a Numeric type Flexfield on the Condition object. We name this field Passthrough

When you create your DQ Condition in Designer you can then set the value for the Flexfield. The value 1 will allow the record to go through to the target table even if there is a violation of the DQ check.

In a next step we need to write some code in our CKM to make use of the Flexfield and implement the above logic.

We store the value of the Flexfield in a variable passthrough and flag all of the records with the passthrough value.

<? passthrough=”<%=odiRef.getFlexFieldValue(“” + odiRef.getCK(“ID”) + “”,”2500″,”PASSTHROUGH_COND”)%>”.replaceAll(“‘”,””””); ?>

As you can see from the figure below, 2500 is the internal ID of object Condition in the ODI repository.