Parsing XML to Oracle. ODI vs Flexter Enterprise XML Parser

by Maciek

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.

Published on December 27, 2016
Updated on November 20, 2024

In some previous posts we demonstrated how to load XML data into an Oracle database using ODI. We first looked at some of the issues when reverse engineering an XSD in ODI. Next we looked at various issues when parsing and processing XML files in ODI. We also showed that Flexter parses XML files without creating a single data flow or writing one line of code. And even better. It just scales no matter the number or size of your XML files.
ODI is a brilliant ETL tool, one of the best on the market. However, for converting XML files into a database or data warehouse you are better off to use a dedicated enterprise XML parser such as Flexter.
Let’s do a side by side comparison of Flexter and ODI
Get our e-books Discover the Oracle Data Integrator 11g Repository Data Model and Oracle Data Integrator Snippets and Recipes
[flexter_banner]

Analysing the XSD (XML schema)

	ODI	Flexter
sources	modular schema(multiple XSD files): ODI ≥ v11.1.1.7	support for modular schema: multiple XSD files auto-import of XSDs from http(s), ftp, sftp, hdfs support for schemas in archives (zip, gz, tgz, 7z)
processing speed	varies: seconds-hours	steady: seconds
complexity and support of XSD specification	low – medium	high
output schema optimizations/ compactness	some basic optimizations	multiple options – fully configurable

Sources: Older versions of ODI can’t process schemas with multiple XSD files. Flexter supports modular XSDs including XSD URL import links – it automatically downloads and analyses externally referenced schema files.
Processing speed: The time ODI needs to reverse engineer a schema grows exponentially with the complexity of the schema.

	ODI	Flexter
SEPA (pain.001.001.03).	8 mins	24 seconds
NDC (FlightPriceRS)	2 hrs 36 mins	1 min 4 seconds

Support XSD Specification: The ODI XML driver claims to support a wide range of the XSD specification. However, many of the complex XSDs we tried, failed to reverse engineer. Troubleshooting proves to be quite difficult even when logging is switched on.
The various issues preventing ODI from successfully reverse engineering the schema were:

no resolution for naming conflicts (element names vs. SQL reserved words, e.g.: AT or COMMENT, DATE etc)
problems in handling certain namespace scenarios in XML schemas
no support for inline or external XSD references
problems with Unicode BOM markers or nullbyte characters
limited support for detecting and handling recursive XPaths in the schema. This is partially due to the next issue:
matching names of subtypes in a hierarchy are not detected, e.g. person > department > manager > person. The ODI driver wrongly assumes that the second person type in the hierarchy has the same structure as the first.

The last issue is documented:
Using different sub-elements with the same name but with different types is not supported by XML driver. An XSD with such a structure will not be processed correctly.
Flexter was purpose-built to handle large and complex schemas and has been thoroughly tested against a large number of industry standards. Usually it only fails when there are errors in the schema structure itself.
Optimizations: Both tools analyse an XSD schema to generate a relational target schema.
Flexter optimizes the target schema and significantly reduces the number of output tables. This makes it so much easier to make sense of the data and work with it downstream.
Flexter implements two main optimizations:
Elevate – include child elements into parent structure for 1:1 relationships
Reuse – create a reusable placeholder table for all references to the same XSD type (reference table in database speak)
Both optimizations can be used together or independently.
In addition, Flexter offers the ability to only create those target tables for which data exists in the XMLs. If the XSD contains a couple of hundred tables but the XML just uses five of those then only those five tables will be generated. This means that there will be a lot less tables to query, join and analyse downstream.

	ODI	Flexter	Ratio
SEPA (pain.001.001.03).	287 tables	13 tables	4.5%
NDC (FlightPriceRS)	2333 tables	24 tables	1%

Import data to database

	ODI	Flexter
Development Effort	very high	none
Data Lineage (source to target map)	N/A	automatic
XML schema visualisation	N/A	automatic
Orchestration	manual	automatic
Run-time Error Handling	none	graceful
Scalability	N/A	to infinity and beyond. Multiple cores across multiple servers
Error log	Yes	Yes

Development Effort: In ODI it’s the developer’s job to create the mappings between source XML and relational target schema.
Once the data is in the target database there is another problem to handle. The keys generated by the ODI driver will get reset with each new data load. Inevitably, this will lead to key clashes. Yet another problem to deal with. Your only options are to either tinker with the key generation or process the data further before performing another load.
In contrast, Flexter automatically creates an optimised target schema and populates the required tables directly in the database. The generated keys are globally unique . Not a single line of code is needed. Data can be easily appended without worrying about key conflicts. Data analysts or ODI developers can then use SQL for further processing against a highly simplified target schema.
Data Lineage/Schema Visualisation:
Flexter provides data lineage in the form of source to target maps. The user can browse and drill into the relational target schema through a visual interface. This is not something you can do in ODI.
Orchestration and Performance:
Once all of the required mappings have been built in ODI, they need to be executed in a loop for every available XML file. The file looping operation is not natively supported by ODI and needs to be manually built along with all file and error handling. Executing this operation repetitively in a loop over tables AND files makes the whole process tiresome and inefficient.
Flexter takes a URI path as input, this can be anything that points to a directory or file: local disk path, URL, FTP, SFTP or even HDFS. Flexter then automatically loads all of the available files in parallel. It also supports a streaming mode where it automatically picks up any files that have been added to source URI.
Run-time Error Handling: For cases where the encountered data values violate the XSD, Flexter tries to handle this situation, ie. by trimming the string values to the allowed length. If Flexter can’t process the data, e.g. for data elements we find in the XML but have not been specified in the XSD. Flexter parks such data in an error dump. In both cases Flexter records the information in the log and continues processing.
ODI doesn’t handle those scenarios and fails immediately.
Scalability:
ODI is limited to the resources given to single Agent machine. Multiple ODI Agents can’t be used to parallelize the load (it’s only for redundancy).
Flexter is built in Scala on top of Apache Spark, a distributed and highly scalable data processing framework. It can be launched on a single machine as well as a multi-node cluster. Flexter scales linearly and can handle any XML data volume. It shreds the XML on the XPath, stores the data in memory, and then loads it into the specified target. All in parallel. It runs on both Linux and Windows.

Conclusion

While ODI supports XML the development process can be quite tedious and slow. Complex schemas may prove very difficult to implement. Performance is also an issue as bulk loading of XMLs is not supported natively. Don’t get me wrong, ODI is a brilliant ETL tool, probably the best on the market. When processing XMLs it is suitable for simple requirements with low data volumes. However, for complex XMLs and a large number of XML files you are better off using a dedicated enterprise XML parser like Flexter. This will save you time, effort, and dollars. It will also eliminate any risk from your XML projects.

In the next post we will show how these two tools can work together to ensure maximum productivity and utilisation of developers’ time.

Find out for yourself how easy it is to process complex XML files with Flexter XML Parser
[flexter_button]
Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.

About the author:

Maciek

Co-founder of Sonra

Follow Maciek:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

Parsing XML to Oracle. ODI vs Flexter Enterprise XML Parser

Analysing the XSD (XML schema)

Import data to database

Conclusion

Maciek

Parsing XML to Oracle. ODI vs Flexter Enterprise XML Parser

Analysing the XSD (XML schema)

Import data to database

Conclusion

Related Articles

Cookies consent