Oracle Data Integrator and Hadoop. Is ODI the only ETL tool for Big Data that works?

Uli Bethke Big Data, Hadoop, Oracle Data Integrator (ODI)

Both ODI and the Hadoop ecosystem share a common design philosophy. Bring the processing to the data rather than the other way around. Sounds logical, doesn't it? Why move Terabytes of data around your network if you can process it all in the one place. Why invest millions in additional servers and hardware just to transform and process your data?

Teach me Big Data to Advance my Career

In the ODI world this approach is known as ELT. ELT is a marketing concept pointing to the fact that data transformations are performed in the same processing engine where the data resides than moving it around for transformations. It has underpinned the product since its inception.

While other ETL tools such as Informatica now also offer some pushdown functionality (e.g. Hive pushdown) it is not in the DNA of these tools or companies to do so. Traditionally, these tools settled for a completely different approach and the problems of this are now showing more so than ever before. It is hard for these vendors to work around their original design philosophy. Let me compare this to Microsoft and Google. While the latter has the Internet and Big Data in their DNA as a company the former doesn't and Microsoft are throwing huge resources at this problem without being overly successful at closing the gap. Let me ask you another way. Why settle for the copy if you can get the real thing?

The advantage of ODI over traditional ETL tools doesn't stop there. ODI has a concept of reusable code templates aka Knowledge Modules. This meta data driven design approach encapsulate common data integration strategies such as timestamp based extracts, data merging, auditing of changes, truncate loads, parking defective records in an error hospital etc. and makes them available for reuse. This can result in ETL developer productivity gains of more than 40%.

What will the future of data integration on Hadoop look like? At the moment a lot of the ETL is still hand written using custom Map Reduce jobs. As SQL engines on Hadoop reach a higher level of maturity they will be the vehicles for 90%+ data transformation flows for Big Data. Only for very specific use cases where performance is the highest priority will we see custom coding on Spark, Map Reduce etc. Based on the underlying design principles, Oracle Data Integrator is a perfect match for Hadoop.

Coming back to my question in the headline. Yes, I believe that Oracle Data Integrator really is the only ETL and data integration tool that is fit for purpose for Big Data workloads.

If you are planning to run a Big Data project, an ODI implementation, or both then get in touch with us. Why settle for second best if you can get the ODI and Big Data experts?

About the author

Uli Bethke LinkedIn Profile

Uli has 18 years’ hands on experience as a consultant, architect, and manager in the data industry. He frequently speaks at conferences. Uli has architected and delivered data warehouses in Europe, North America, and South East Asia. He is a traveler between the worlds of traditional data warehousing and big data technologies.

Uli is a regular contributor to blogs and books, holds an Oracle ACE award, and chairs the the Hadoop User Group Ireland. He is also a co-founder and VP of the Irish chapter of DAMA, a non for profit global data management organization. He has co-founded the Irish Oracle Big Data User Group.