In the second part of the series we cover:
More discussion on ODI vs Informatica
More on migrating from OWB to ODI
Using ODI outside the data warehouse (BI Apps)
ODI in the cloud
ODI and Big Data
In the second part of the series we cover:
More discussion on ODI vs Informatica
More on migrating from OWB to ODI
Using ODI outside the data warehouse (BI Apps)
ODI in the cloud
ODI and Big Data
I really like this tip especially its simplicity. It has saved me hours and hours of mind numbing work in the past.
Imagine the following situation. You are tasked to complete a source to target map. With the help of various source system SMEs, legacy data models, and the data dictionary of the source database you have been able to complete 99% of the mappings. However, there is that one field that is nowhere to be found in the source system (there’s always one). The business users have given you some sample values for this field. Wouldn’t it be nice to run a query such as SELECT column_name FROM schema/database WHERE value = ‘Sample Value’.
I will show you how you can achieve something similar with ODI.
The trick is to dump the content of your schema or database into a folder and then use grep your Windows search or whatever to search for your sample value.
In ODI this is extremely easy. Using the Command on Source/Target functionality it can be completed with two lines of code.
Our objective is to find the table where ODI stores Option values for interfaces.
In a first step we create the procedure that dumps the content of the ODI schema to a folder
Command on Source
We query the data dictionary for those tables that are owned by the ODI work repository user. If you want to dump out the whole database just get rid of the WHERE clause.
SELECT table_name,owner from all_tables WHERE owner = ‘<name of your work schema>’
Command on Target
We use the tool OdiSqlUnload to unload those tables from the Command on Source recordset. For each table we create a separate text file.
OdiSqlUnload “-FILE=C:\unload\#table_name.txt” “-DRIVER=oracle.jdbc.driver.OracleDriver” “-URL=<your connection>” “-USER=#owner” “-PASS=<your password>” “-FILE_FORMAT=VARIABLE” “-FIELD_SEP=;” “-ROW_SEP=\r\n” “-DATE_FORMAT=yyyy/MM/dd HH:mm:ss” “-CHARSET_ENCODING=ISO8859_1″ “-XML_CHARSET_ENCODING=ISO-8859-1″
SELECT * FROM #table_name
In a second step we enter a dummy value for one of our interfaces. This will be the value that we will search for in the dump.
Our dummy value is XXXXXX.
Next we run the procedure. This will dump the table contents into files.
Finally, we use Windows Search, grep or similar to find the file where our sample value is stored.
Windows search comes back with one result. The last column of table SNP_UE_USED, which is I_TXT_VALUE.
I am sure you can think of many more use cases for this. Of course, you can make all of this more sophisticated and wrap it up in a package with options to run against different databases such as MS SQL Server etc.
For very big databases you will need to watch performance. You may want to only extract a subset of tables and search for your sample value there before moving on to the next set and so on.
One of the really annoying things about ODI is that it is not easily possible to move or copy objects between projects. Below is a step by step guide you can follow to copy or move objects between projects. We will take the object of type package as an example as this is the most complicated and covers off any of the others as well. The method shown uses Duplication mode import. While this method is not recommended by Oracle support. I never had any issues with it.
So let’s get started:
1. Migrate any of the variables used inside the package using Export and Import (Duplication Mode). A variable with the same name may already exist in the target project. This means you have to rename the variable and fix any references to it in interfaces, procedures, packages, and the Topology (the details on this follow below).
2. Migrate any of the project functions and project Knowledge Modules used using Export and Import (Duplication Mode)
For the KMs consider using global KMs instead
3. Export the root folder (with child components) of where the package is located. The reason for this is how Duplication mode import in ODI works. Basically it does not recreate internal IDs for objects that are part of the export XML. This strategy is used to minimise the number of missing references and as a result the amount of work we need to do.
4. Next we import the exported folder into our new project.
Note the import needs to be made in Duplication Mode.
5. Next we move (drag and drop) the object(s) we need from our imported folder to our target folder in the new project.
6. Next we can delete the imported folder
7. Any missing references to Knowledge Modules need to be recreated for all of the interfaces used in the package
Note: This also applies to all of the options for each KM.
If variables are referenced using the old project code in any of the KMs or you had to rename a variable during import then these need to be updated as well
8. Update references to any variables or project functions used in procedures
If variables are referenced using the old project code in any of the Procedures or you had to rename a variable during import then these need to be updated as well
9. Update references to any variables or project functions used in other variables
If variables are referenced using the old project code in any of the Variables or you had to rename a variable during import then these need to be updated as well
10. Update references to any variables used in the Topology, e.g. as part of physical data servers
11. Update references to any variables or project functions used in other project functions
12. Next we need to replace any missing references inside the package
These are typically only references to variables unless your package also references procedures etc. from outside the root folder that we exported/imported
We need to recreate those links by deleting the variables in the package and adding them again. Make sure that you select the correct Variable Type when recreating the variable in the diagram (Refresh Variable, Declare Variable etc.).
Open the legacy package for a side by side comparison to minimise mistakes.
13. If you reference scenarios in your package and you pass variables to these scenarios you will also need to repoint the variables in the Additional Variables tab for this scenario
14. Replace and fix references to new markers
15. Delete any scenarios for the legacy object(s) and then for the migrated object(s) if applicable.
16. Generate scenario for the migrated object(s)
17. Export legacy objects for backup (optional, if you are the cautious type)
18. And finally: Delete any of the legacy objects you migrated.
I am proud to announce that Sonra are now offering ODI classes.
Value-priced. Customised. Onsite-Offsite. Online.
Five day standard training course.
Starts 24 February 2014.
Book before 15 January 2014. Pay $949.
More information on our ODI courses.
The features I am most looking forward to are:
- Smart Import/Export feature. Hopefully this will allow import/export between projects in the same repository.
- Global Knowledge Modules
- Versioning of Knowledge Modules
- Tracking variables and sequences
For a full list of new features go to the Oracle website.
We recently evaluated the ODI-OBIEE data lineage feature that was added to the most recent release of ODI 11g.
As part of this evaluation we came across various limitations of this out of the box data lineage functionality.
First of all we came across what I believe to be two bugs (one of them serious):
1. The data lineage feature does not take temp interfaces into account, i.e. if you are using a temp interface in any of your data flows from source to final target then the data lineage is broken. I logged this issue on 17 October 2011 with Oracle Support. However, up until today they have not been able to set up a joint OBIEE 11g/ODI 11g environment to reproduce the issue. Quote “The OBIEE team is having some issues with the OBI instance”. I never bothered escalating the issue as we decided going against ODI – OBIEE data lineage anyway.
2. Another smaller issue has to do with the fact that for report columns that contain a formula, data lineage is broken.
In the report below we are using a formula.
However, in the data lineage report this is missing. I would have expected to get data lineage for any of the columns that are used. In our case for both amount sold and quantity sold.
The above are just bugs that can be easily fixed. However, the other issue we came across is more serious and cannot be addressed easily. Basically data lineage will be broken/lost if you are using views in your interfaces or procedures to populate your target tables.
The reason for this is that the view will mask the columns of the underlying table and may include additional transformation logic etc. As far as I can see, there is no easy solution to this. One such option would be to write an SQL parser that parses the columns of the underlying table(s) and maps them to the target table.
My estimate is that at least 90% or more of implementations are using views to extract data, e.g. it is best practice to use views to extract data from your data warehouse to your data marts to insulate your data marts from changes to the data warehouse.
What I would be interested in is to hear from people who are using the out of the box data lineage feature and what experience they have had and if they were able to overcome the above limitations.
In part one of this series (Oracle Data Integrator in a Services Oriented Architecture) I had laid the groundwork and explained the basics of a SOA environment and how ODI fits into it. In part two we then looked at creating and deploying data services with ODI 11g in Weblogic. In part three we will now look at SOA from the other angle. I will show you how we can consume data services using ODI 11g.
We can use ODI to consume public web services for data augmentation purposes, e.g. we can query the OpenStreetMap Geocode web service to augment customer data with longitude and latitude information.
OpenStreetMap provides geographic information such as maps, geocoding etc free of charge. You may wonder why we are not using Google maps? Well, first of all it’s not open source and Google’s license does not allow us to store the resultsets of our web service queries. I recently read the excellent book the Filter Bubble and since then try to avoid Google et al. even more. That’s why I have replaced Google search with Scroogle search.
Anyway, let’s get back to the topic at hand. As per the OpenStreetMap license anyone can use and store their data. They only ask you to mention OSM as the source of the data.
For details on the OSM license have a look at: http://wiki.openstreetmap.org/wiki/OpenStreetMap_License
There is also a page for the OSM usage policy
The OSM geocoding documentation can be found here
In addition there is an FAQ for the geocoding API.
When we put the above URL in our browser we will get the following XML in return.
We will now look at how we can extract and store this information using ODI. I will show you two ways of extracting longitude and latitude information from the XML. In the first example we use built in functionality of the Oracle database to extract the information. In the second example we will exclusively use ODI functionality to extract the data.
Example 1: Consuming a RESTful web service – ODI piggybacks on the Oracle database
If we know that the target for our RESTful web service request is an Oracle database we can use the utl_http functionality built into the Oracle database to query the OpenStreetMap geocoding web service. This is convenient as we can directly extract the attribute values we are interested in from the XML that the web service returns. We don’t have to first set down the data in an XML file on the file system and we don’t have to use ODI to load the XML file into our target database. This saves us two steps and a bit of development time.
In a first step we need to create an Access Control List (ACL) for user OE. The ACL will give user OE access to the nominatim.openstreetmap.org website, but prevents access to any other websites. ACLs are new in Oracle 11. We need to grant these privileges as user SYS.
We first create the ACL
begin dbms_network_acl_admin.create_acl ( acl => 'utl_http.xml', description => 'Normal Access', principal => 'OE', is_grant => TRUE, privilege => 'connect', start_date => null, end_date => null ); end;
Next we assign the domains we want to access to the ACL
begin dbms_network_acl_admin.assign_acl ( acl => 'utl_http.xml', host => 'nominatim.openstreetmap.org', lower_port => 1, upper_port => 10000); end;
Finally we grant access to user OE to the utl_http package.
grant execute on utl_http to OE;
Once we have granted the correct privileges to the OE user we can directly access the OpenStreetMap API via the Oracle database. We can piggyback on this functionality from within an ODI procedure.
We create the ODI procedure PRC_GEOCODE_CUSTOMERS that loads the longitude and latitude information into the OE schema.
In a first step in our procedure we create a table at runtime, which stores the XML that we retrieve from the OpenStreetMap geocode API
CREATE TABLE address_geocode ( address_geocode XMLTYPE, customer_id NUMBER )
In the next step we query the OpenStreetMap API. We submit customers with customer_ID 240 and 235 from the OE schema to the OSM geocode API. As we don’t want to swamp the OpenStreetMap servers with too many requests we just retrieve data for two customers.
The Oracle database allows us to make direct http requests through the little known HTTPURITYPE function, which is built on top of the utl_http package. Have a look at the Oracle Books site where more information and examples are given.
INSERT INTO address_geocode SELECT XMLTYPE(HTTPURITYPE('http://nominatim.openstreetmap.org/search?q='||street_address||','||postal_code||','||address_city||','||state_province||','||country_id||'&'||'format=xml'||'&'||'polygon=0'||'&'||'addressdetails=1').getclob()), customer_id FROM ( SELECT REPLACE(c.cust_address.street_address,' ','+') street_address, REPLACE(c.cust_address.postal_code,' ','+') postal_code, REPLACE(c.cust_address.city,' ','+') address_city, REPLACE(c.cust_address.state_province,' ','+') state_province, REPLACE(c.cust_address.country_id,' ','+') country_id, customer_id FROM customers c WHERE customer_id IN (240,235) )
If we execute the procedure we will find two XML records in our address_geocode table.
We are now ready to extract longitude and latitude information from the XML and store in table customer_geocodes together with the customer_id.
We add another step to our procedure in which we create table customer_geocodes at runtime.
CREATE TABLE customer_geocodes (customer_id NUMBER, longitude NUMBER, latitude NUMBER)
We then extract longitude and latitude information from the XML and store it in table customer_geocodes. We use the Oracle XMLTABLE functionality to extract these attributes from the XML. This function allows us to convert an XML file to a relational table. More details on XMLTABLE can be found in the documentation.
INSERT INTO customer_geocodes SELECT customer_id, xt.longitude, xt.latitude FROM address_geocode yt, XMLTable('searchresults/place' PASSING yt.address_geocode COLUMNS latitude NUMBER PATH '@lat', longitude NUMBER PATH '@lon' ) xt
We then execute the procedure and verify that the extracted information was loaded into our customer_geocodes table:
Example 2: Consuming a RESTful web service using native ODI functionality
If we don’t have the luxury of an Oracle database we can still query the OpenStreetMap API using ODI native functionality. We can make use of the Jython scripting language that ships with ODI to access the OSM API.
We create another procedure PRC_GEOCODE_CUSTOMERS_JYTHON. We add one step to the procedure and set the Technology to Jython. In the code we connect to the OSM API, retrieve the geocode XML, and store this on the C drive on the file system.
import java.net.URL as URL import java.io.BufferedReader as BR import java.io.InputStreamReader as SR import java.net.HttpURLConnection as con import base64 import string URLFormat="http://nominatim.openstreetmap.org/search?q=3314+Eastern+Ave,21224,Baltimore,MD,US&format=xml&polygon=0&addressdetails=1" XMLFile="C:OSMGeo_240.xml" OSMURL=URLFormat.replace(' ','%20') url = URL(OSMURL) con = url.openConnection() con.setRequestMethod("GET") con.connect() responseCode=con.getResponseCode() responseMessage=con.getResponseMessage() if responseCode == 200 and responseMessage == "OK": br=BR(SR(con.getInputStream())) outRESTXML = "" outline = br.readLine() while (outline != None): outRESTXML += outline + "n" outline = br.readLine() XMLfilehandle= open(XMLFile,'w') XMLfilehandle.write(outRESTXML) XMLfilehandle.close() else: con.disconnect() con.disconnect()
For the purpose of this walk through I have hard coded the address of the customer with CUSTOMER_ID 240.
Before we can extract the data contained in the XML file into the customer_geocodes table we first need to define the connection to the XML file in ODI Studio Topology.
In a first step we create the data server
Then we add the physical schema
Next we add the logical schema
Then we reverse engineer the model for the XML file
Once we have reverse engineered the model we should see the two new data stores Place and Searchresults in the XML_OSM_SRC model.
Before we create the interface to augment our customer data with longitude and latitude information we need to import two Knowledge Modules: LKM SQL to Oracle and IKM SQL Control Append.
The longitude and latitude information is contained in data store PLACE. We will use this data store as the source in our new temporary interface INT_OSM_XML_LOAD. Once we have added the PLACE data store to the interface drag and drop columns LAT and LON to the Target Datastore and manually add column CUSTOMER_ID to target datastore CUST_GEOCODES (right click in the target datastore and select Add Column).
As data type for column CUSTOMER_ID we select NUMBER. Enter 240 for the CUSTOMER_ID.
Note: In a real world scenario you would of course extract the customer_id from the filename of the XML file.
Next we change to the Flow tab and select LKM SQL to Oracle as the Load Knowledge Module.
We also select IKM SQL Control Append as the IKM and modify the following options:
Next we execute the Interface. This should create a new table CUST_GEOCODES in the OE schema and insert longitude and latitude for customer_id 240.
This example has shown that we don’t necessarily need the advanced features of the Oracle database to query a web service using ODI.
I think one of the best new features in ODI 11G is the subselect/subquery feature. In ODI 10 this could only be achieved by a workaround as outlined in a previous post Using subqueries in Oracle Data Integrator.
What is the advantage of a subquery?
In previous versions of ODI we had to physically set down the data for each indiviudal query, e.g. if we wanted to rank a dataset and then select the top ten out of that dataset we had to create an interface for the ranking operation and an interface for the top ten operation. At each point we had to set down the data thereby increasing I/O and decreasing overall performance. In ODI 11 this has changed. While we still need to create two temp interfaces for the operation as a whole we can now tell ODI to treat the first temp interface as a derived table. ODI 11G will then use this to generate a subquery.
How does the subquery work in ODI 11?
The way this works is fairly simple. For each subquery/derived table in your query you create a temp interface. You embed your various subqueries by simply telling ODI that you want to use the temp interface as a subquery.
A step by step guide
We will look at an example from the SH schema. The task at hand is to load a table with the top ten customers based on sales amount.
To accomplish this task we need three temp interfaces.
The first temp interface (INT_SALES_CUST) will aggregate the sales amount from the sales table by cust_id
The second temp interface (INT_SALES_RANK) will take the resultset from the interface in the previous step and dense rank the customers’ sales data.
The third interface (INT_TOP_TEN) will then select the top ten customers, join to the customers table, and physically set down the data.
So far so good. These are the same steps we took in ODI 10.
In a next step we need to subquery enable the interfaces. We open interface INT_SALES_RANK and click on the INT_SALES_CUST data store. In the Source Properties section you will find a checkbox Use Temporary Interface as Derived Table. Select this checkbox.
We also need to perform the same step for our third interface INT_TOP_TEN.
Once this has been done we can execute interface INT_TOP_TEN in Simulation mode.
This will generate the required query with the embedded subqueries from the temp interfaces. What a great feature.
INSERT INTO sh.cust_top_ten (sales_rank, cust_first_name, cust_id, sales_amt, cust_last_name) SELECT sales_rank, cust_first_name, cust_id, sales_amt, cust_last_name FROM (SELECT sales_rank.sales_rank sales_rank, customers.cust_first_name cust_first_name, customers.cust_id cust_id, sales_rank.sales_amt sales_amt, customers.cust_last_name cust_last_name FROM (SELECT sales_cust.cust_id cust_id, sales_cust.sales_amt sales_amt , Dense_rank() over (ORDER BY sales_cust.sales_amt DESC) sales_rank FROM (SELECT sales.cust_id cust_id, SUM (sales.amount_sold) sales_amt FROM sh.sales sales WHERE ( 1 = 1 ) GROUP BY sales.cust_id) sales_cust WHERE ( 1 = 1 )) sales_rank, sh.customers customers WHERE ( 1 = 1 ) AND ( sales_rank.cust_id = customers.cust_id ) AND ( sales_rank.sales_rank <= 10 )) odi_get_from
You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site
ODI has a built in SOAP client, the ODIInvokeWebService tool. However, there is one caveat when you try using this with Java 6/JDK 6. It simply doesn’t work and you will get the following error:
This is documented in note 1085594.1 and the suggested workaround is to fall back to an earlier version of Java. Another option of course is to install another agent that uses Java 5 alongside your Java 6 agent.
The first step we need to take is to install JDK 5. You can download JDK 5 from the Oracle website.
If repositories are running on Oracle we also need to download the compatible Oracle JDBC 5 driver. If you are running your repositories on another RDBMS get the equivalent JDBC driver.
Next, inside the oracledi folder we create a new folder and name it bin_jdk5.
We then copy the content of the bin folder into bin_jdk5.
The next step will be to create a new environment variable and name it ODI_JAVA_HOME2. The variable needs to be pointed to the JDK 5.
As a next step we need to edit the odiparams.bat file in the bin_jdk5 folder. Open the file in Notepad and replace ODI_JAVA_HOME with ODI_JAVA_HOME2
In the same file we also need to replace any occurrence of the word drivers with drivers2
Next we create a new folder drivers2 inside the oracledi folder and copy and paste the downloaded Oracle JDBC 5 driver into it. Alos copy any other drivers from the drivers folder into the drivers folder that you may want to use with this agent
Finally we create a new agent that uses the JDK 5 as a Windows service.
First edit file agentservice.bat and replace bin\odiparams.bat with bin_jdk5\odiparams.bat as per figure below
Then we create a copy of file snpsagent.conf in folder oracldi\tools\wrapper\conf. Replace any occurrence of drivers with drivers2.
We also replace wrapper.working.dir=../../../bin/ with wrapper.working.dir=../../../bin_jdk5/
Next we create the physical and logical agent in Topology Manager.
Finally we install the agent as a listener or scheduler agent.
In one of the next posts I will show how we can make good use of ODIInvokeWebService to query the OBIEE web services API.
My friend and colleague Maciej Kocon has come up with the following trick. Rather than use a user defined function to implement an analytic function such as MAX() OVER or SUM() OVER as suggested by note 807527.1 there is another way to work around the getGrpBy bug.
We will take advantage of the ODI multi pass parser and hide the analytic function from getGrpBy in our mappings:
<?out.print(“SUM”);?>(SALES.AMOUNT_SOLD) OVER (PARTITION BY PRODUCTS.PROD_CATEGORY)
The above SUM gets only generated once getGrpBy has been parsed and as a result no GROUP BY clause is generated, which is the correct behaviour.
A brilliant yet simple trick.
In order to master scripting in ODI I recommend the following books.
Setting up integrated security with the SQL Server JDBC driver can be tricky. You often end up with the message “This driver is not configured for integrated authentication”. Below are the steps that you need to perform to get this to work.
Note: This will only work where the ODI agent has been installed on a Windows OS.
1. Download the Microsoft SQL Server JDBC Driver 2.0 from the Microsoft website. This is compatible with versions 2000, 2005, and 2008 of MS SQL. You will need to be up and running on JDK 5.0 or later with your ODI installation. If you haven’t done so already now is the time to install JDK 5.0 or even better JDK 6.0.
2. Unzip the content of the download.
3. The driver comes with a separate dll (sqljdbc_auth.dll) that will allow you to connect to SQL Server via your Windows login. Copy the sqljdbc_auth.dll to the \oracledi\drivers folder.
4. Copy the sqljdbc4.jar file to the \oracledi\drivers folder
5. Open odiparams.bat and add the following entry: set ODI_ADDITIONAL_JAVA_OPTIONS=”-Djava.library.path=\oracledi\drivers”, e.g. set ODI_ADDITIONAL_JAVA_OPTIONS=”-Djava.library.path=c:\oracle\oracledi\drivers”
6. If you have set up the ODI agent as a Windows service you need to re-install the agent. Stop the agent and uninstall the agent with the agentservice –r command. Next you need to edit the “snpsagent.conf” file in the \tools\wrapper\conf directory.
7. Re-install the ODI agent with the “agentservice -i” command.
8. Open Topology Manager module and create Physical Server for MS SQL technology. Leave username and password blank
Go to JDBC tab and enter
9. One last thing. Of course, your ODI agent must not run as the Local System account but use a network account that has the appropriate MS SQL privileges to log into the SQL Server.
If you want to master scripting in ODI get the following books.