Archives

Written by Uli Bethke

REAL TIME BI PODCAST ON ORACLE DATA INTEGRATOR 12C. Part II.

In the second part of the series we cover:

More discussion on ODI vs Informatica
More on migrating from OWB to ODI
Using ODI outside the data warehouse (BI Apps)
ODI in the cloud
ODI and Big Data

)

Big Data Presentation

The Big Data presentation I gave yesterday is now available for download. In this presentation I define some common features of Big Data use cases, explain what the big deal about Big Data is all about and explore the impact of Big Data on the traditional data warehouse framework.

Real Time BI Podcast on Oracle Data Integrator 12c. Part I.

I recently did a podcast with Stewart Bryson (Chief Innovation Officer RittmanMead), Kevin McGinley, and Alex Shlepakov (both Oracle Analytics at Accenture).

In the first part of this two part series we cover the following areas:

ODI 12c. What are the advantages? When should you upgrade?
Migration from OWB to ODI 12c. Should you migrate? How and when?
Comparison of ODI to Informatica and other ETL tools.
ETL style vs. ELT style data integration tools.
ODI, ETL, and data integration in the cloud.

What’s the Big Deal about Big Data? Hear me speak at OUG Ireland. 11 March 2014. Convention Centre Dublin.

What’s the Big Deal about Big Data? Hear me speak at OUG Ireland. 11 March 2014. Convention Centre Dublin.

So what’s the Big Deal about Big Data? Oil has fueled the Industrial Revolution. Data will fuel the Information Revolution.

Not convinced? Did you know that Amazon has recently patented a technology based on a Big Data algorithm that will start the shipping process before you have completed your order. That’s right. Amazon knows that you will buy some stuff from their website before you know it yourself. How amazing or scary is that?

In my upcoming presentation on 11 March in the Convention Centre in Dublin I will explore this topic further and I will talk about

- What is Big Data (and what it is not)?
- Some interesting use cases to show you what is possible.
- Why the traditional data warehouse framework and technology don’t work for Big Data.
- Big Data architecture for the Information revolution.
- The Oracle Big Data armoury

Registration for the event is now open.

Hope to see you there and talk about your Big Data project.

ODI 11g Repository Book Out Now

The book is for free. All you need to do is contact us. We will then send you a PDF version of the book. All we ask you to do is to permanently link from your blog, website, Facebook etc. to our site.

Alternatively, you can buy the book on Amazon.com. You can get the Kindle edition for $9.99. Send us your Amazon receipt for the the PDF version, which scales the submodels better.

You may be asking yourself why we have written a reference book for the ODI repository? Why not use the excellent ODI SDK to query that information? While the SDK is great at automating certain repetitive tasks it is not so good at retrieving information (in bulk) from the repository. Most ODI developers are familiar with using SQL. Not too many of us have a background in Java programming.

Some of the use cases for this book

- Extract information from the Operator log for reporting
- Impact analysis, e.g. find all interfaces a datastore is used in.
- Identify objects for deployment
- Identify objects that have recently changed
- Error message extraction
- Automated code checklist

We have split the book into two sections. The first section covers the most important subject areas of the master repository, the second section covers the most important subject areas of the work repository. Each subject area comes with a submodel, a list of relevant tables and corresponding ODI screens, and most importantly one or more useful pieces of SQL code to query the subject area in question. We have also made the models of the subject areas available online on our website http://sonra.io/odi11rep.

The book is for free. All you need to do is contact us. All we ask you to do is to permanently link from your blog, website, Facebook etc. page.

Alternatively, you can buy the book on Amazon.com. You can get the Kindle edition for $9.99. Send us your Amazon receipt for the the PDF version, which scales the submodels better.

In the book we cover the following subject areas

Physical & Logical Architecture
Repository
Link Master and Work Repository
Solution
Version
Internal Tables
Data Store
Interface Mapping
Interface Flow
Interface Knowledge Module Options
Interface Model
Interface Clause (Where & Join)
Session (Operator Log)
Package
Model Hierarchy
Project Hierarchy

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site

Oracle Endeca Demo Environments

Just a quick note to let you everyone know that the latest version of Endeca is available as a demo environment (hosted by Oracle).

The environment is hosted by Oracle and can be accessed with the credentials below.

URL: http://slc02oku.oracle.com:7101/eid/web
Userid = publicuser@oracle.com
Password = Admin123

Note: As the demo environment runs on port 7101 you may have problems accessing it from inside your corporate environment as your firewall will likely block this port.

Need ODI Training? Learn Oracle Data Integrator from the experts.

I am proud and delighted to announce that we are now offering ODI classes.

Learn from a combined 18+ years hands on ODI experience and get all these benefits.

Top Quality

Choose world class ODI training from THE ODI gurus. Don’t settle for less.

Value-Priced

Unrivaled in quality AND affordable at the same time.

Training world-wide

On-site or via our state of the art virtual classrooms.

FREE consulting

Get free consulting with our aftercare consulting package.

Tailor-made Training

Tell us your requirements and we will put together a custom course for you.

Mix and Match Modules

Mix and match standard modules. Combine them with custom modules.

For more information on course content and how to book your ODI course visit our ODI training page.

The Oracle Data Integrator 12c Masterstroke

Visual data flows and OWB to ODI 12c migration path

In their latest release of Oracle Data Integrator (ODI 12c), Oracle have addressed two main concerns of the analysts and some of their customers and partners. The first one is the unclear migration path from Oracle Warehouse Builder to Oracle Data Integrator. Another frequent concern was that the declarative design approach based on reusable code templates (knowledge modules) was not visual enough and scored badly in this category against other ETL tools.

People were rightly raising concerns that complex queries could not be implemented through the ODI user interface. I had always been wondering how Oracle would address those two issues. In what has to be labeled as a masterstroke of the ODI product and development teams they were able to kill two birds with one stone. And two big birds those are.

Note: As a side note I have always been wondering why ODI’s declarative design approach has not really made it on the analyst’s list of strengths for ODI. To me this metadata driven design approach that makes extreme reusability possible and significantly reduces development time is the core innovation and ultimate strength of Oracle Data Integrator.

Declarative Flow-Based User Interface

In ODI 12c, Oracle have introduced a feature that they call Declarative Flow-Based User Interface. Don’t be distracted by the name. What it means is that we can now create data flows (Interfaces were re-named to Mappings) that combine the two approaches: the visual approach of Mappings and Operators that we already know from Oracle Warehouse Builder and the powerful declarative design approach we love from Oracle Data Integrator.

With this new paradigm you essentially get the best of both worlds: ease of visual data flow development and the power of the declarative design approach. No other vendor can offer anything comparable. When would you combine the two approaches? For simple requirements I would recommend to stick to the traditional ODI approach. However, when you have complex requirements, e.g. you need to pre-aggregate data or filter a recordset based on the application of an analytic function then the new feature comes in handy.

At the moment we don’t have all of the Operators available that we know from OWB, e.g. I am missing the Pivot and Unpivot Operators, which means that complex queries requiring these can still only be generated using workarounds such as Views or similar. What would be handy is an Operator SDK that would allow to create GUI representations of database specific SQL dialects, e.g. I love subquery factoring and it would be handy to have an Operator for this or one for recursive queries etc.

ODI12c

ODI 12c Mapping: Visual Design with new Operators

OWB to ODI 12c migration

The introduction of OWB style Mappings should also faciliate the migration from Warehouse Builder to ODI 12c. The migration tool is still in Beta. While it still remains to be seen how well the migration toll will work, ODI 12c for the time being ships with the OdiStartOwbJob tool, which allows to kick off OWB jobs through ODI and store execution results in the ODI Operator log. If you have an immediate requirement to migrate from OWB to ODI contact us for our advice.

Other new features

Another interesting feature in ODI 12c is the ability to run Mappings in parallel. In practice that means that each of the temporary tables gets its own unique name avoiding clashes. In the past you had to apply workarounds.

There have also been enhancement to the Knowledge Module editor. For those of you writing a lot of their own Knowledge Modules this is welcome news.

ODI 12c now also has much tighter integration with Oracle Golden Gate. A welcome introduction for those of you running near real time data integration projetcs (another concern of analysts and some customers).

ODI 12c Gaps

While ODI 11g has brought us the SDK to automate and script common tasks, ODI 12c now brings us visual Mappings giving us new options to create data flows. Oracle Data Integrator remains the leading ETL tool on the market. With ODI 12c it has extended this lead even further. How Gartner don’t rate it above DataStage or Powercenter in their Magic Quadrant is a mystery to me.

One or two weaknesses remain. ODI 12c does not yet integrate with  source control systems out of the box. There are also no automated deployment options. While this functionality can be scripted it takes significant effort to do so. I am currently in the process of testing such an in-house developed solution. If you would like our advice how source control integration and automated deployments can be achieved get in touch.

While ODI 11g brought us improvements in the way we schedule and orchestrate data flows I am a strong advocate of dependency driven data flow execution. It is so much simpler and more efficient to hard coding of data orchestration routines. An enterprise solution with several thousand data flows will not work smoothly without a dependency driven scheduler. If you would like to find out more about our solution get in touch. You can also download our presentation ODI Scheduler- Source Control – performance on both source control and dependency driven scheduling.

Resources

The ODI 12c Developer Guide: http://docs.oracle.com/middleware/1212/odi/ODIDG/index.html

New Features of ODI 12c: http://docs.oracle.com/middleware/1212/odi/ODIDG/whatsnew.htm#sthref3

Download Oracle Data Integrator 12c: http://www.oracle.com/technetwork/middleware/data-integrator/downloads/index.htm

Install tip ODI Studio: https://blogs.oracle.com/dataintegration/entry/odi_12c_installing_odi_studio

ODI Training. Learn ODI from the experts.

You may also be interested in our ODI training courses. Value-priced. Customised. Onsite-Offsite. Online. Get all the details from our ODI training site

ODI 11g Cookbook – The leading ETL tool now also has the best ETL book

I don’t know too many books or movies where the sequel is better than the original. The ODI 11g cookbook is such a rare case. It is stuffed with 60 valuable recipes that every ODI developer should know. Apart from the value of these recipes in their own right they also showcase the flexibility of ODI and can be transfered to other problems.

My favourite chapters are; Knowledge Module Internals, Advanced Coding Techniques, Advanced Topology, and Using Variables.

A big thank you to the ODI product management team for sharing their insights.

Endeca text enrichment. Entities extraction, sentiment analysis, and text tagging with Lexalytics customer defined lists. I love this one!

One of the interesting options of Endeca is its integration with text mining software Lexalytics (licensed separately). Lexalytics offers many text analysis functions such as sentiment analysis, document collection analysis, named entity extraction, theme and context extraction, summarization, document classification etc. Endeca exposes some of this functionality via its text enrichment component in Clover ETL. It is worthwhile noting that not all of the text analytics functionality is exposed via text enrichment and of those features that are exposed only a limited number of methods of the Lexalytics API are exposed (more on that later). A great way of learning more about Lexalytics is to visit their website, Wiki, and blog. I will post some more stuff on text analytics and Lexalytics in particular in one of my next posts.

Text tagging with Lexalytics

In my last post on Endeca we were using the text tagger component to tag a list of 16K Irish IT job records with 30K skills extracted from LinkedIn. If you remember we saw some dreadful performance and also realised that the text tagger component is not multi-threaded and maxed out at 25% on a quad-core CPU. The text enrichment component also offers text tagging and based on the documentation is also multi-threaded. So my expectation is that the tagging process is a lot quicker. Apart from the skills tagging exercise we will also perform some entity extraction on the jobs data and focus on People, Companies, and Products.

In a first step we download Lexalytics from edelivery.

We then add the license file that comes with the download to the Lexalytics root directory. In my case E:\Program Files (x86)\Lexalytics

We then add the salience.properties file to our project. This configuration file sets the various properties that we can make us of, e.g which types of entities we want to extract. As you can see from the screenshot below we will extract entities of type Person, Company, Product, and List. The interesting entity is the List entity. Through the List entity we can specify our own custom lists to the Lexalytics Salience engine.

Custom lists are made available to Lexalytics in the E:\Program Files (x86)\Lexalytics\data\user\salience\entities\lists folder as a tab separated Customer Defined List file, which contains various values and a label separated by Tab.

One big shortcoming of the Endeca text enrichment component is that it does not extract the label that can be defined in the customer defined list, e.g. The following entry has the tab separated label of Business Intelligence. The text enrichment component does not extract this even though this is exposed by the Lexalytics API. This also means that you can only define one customer defined list per text enrichment batch, as all of your CDLs will be dumped into the List field.

OBIEE, Cognos, Micro Strategy Busines Intelligence

Anyway, below is our tab separated custom list

Next we add the location of the Lexalytics license file and the data folder as variables to our project workspace parameter file.

As in the previous post we read the scraped data of Irish IT jobs from a MySQL database. The number now stands at 21K. The data flow is a lot simpler than what we had to do with the text tagger.

We are now ready to configure our text enrichment component. It’s all pretty straightforward. We supply:

Configuration file: This is the path to the salience properties file

Input file: This is the field in our recordset that we want to extract entities from and use for tagging.

Salience license file: Path to Lexalytics license file

Salience data path: Path to Lexalytics data folder

Number of threads: Hoooraaaay! This component is multi-threaded. During my test runs it used 1 core and maxed out at 25% CPU for 1 thread, 2 cores and maxed out at 50% CPU for 2 threads and so on.

In a last step we have to define the metadata and add it to the edge of the graph.

We are now ready to run the graph. If you read my last post on text tagging you will remember that tagging with the text tagger component took a whopping 12 hours to tag 16K records against 30K skills. With the text enrichment component it took 20 minutes to tag 21K records and on top of that we also extracted Person, Product, and Company entities and did a bit of sentiment analysis as well.

Here are some of the results

Conclusion

  • If you have Lexalytics licensed as part of your Endeca install use it for text tagging rather than the Text Tagger component, which is pretty…, well,  lame.
  • Unlike the Text Tagger component the Text Enrichment component (thanks to the underlying Lexalytics salience engine) is fine piece of software engineering. Unlike the text tagger it is multi-threaded and increasing the number of threads to 4 increased my CPU usage to 100%. It processes the incoming records in batches and it was a joy to watch how it cleared out the memory properly after each batch.
  • The text enrichment component only exposes a subset of the Lexalytics functionality. If you want to make use of the full potential of the Salience engine you need to write your own custom code.
  • The text enrichment component offers a lot of room for improvement (1) It does not expose the full feature set of Lexalytics (2) The implementation of CDLs should include Label extraction. If you want to make use of the full Lexalytics functionality you would need to write a bit of custom code, which looks pretty straightforward to me.
  • [/list_check]

    In the next posts we will load the data into the MDEX engine and start our discovery process. I wonder what we will find out???