Flexter Enterprise XML Converter - FAQ

Vadim Mytarev Flexter, XML


Flexter is an enterprise ETL tool for XML/JSON. It automatically converts XML/JSON to text, any relational database, or Hadoop/Spark (ORC, Parquet, Avro). It works with any industry data standard (ACORD, FpML, FIXML, ISO 20022, HL7 etc.). Once the conversion process is complete, data analysts and other consumers in need of querying XML data can use SQL or standard reporting tools such as Tableau, Qlik, Looker etc. to generate insights

Flexter completely automates the whole process of converting XML/JSON files to a relational format. No custom development is needed. This can save up to 80% of the overall conversion costs. You don’t need to hire external consultants with XML/JSON expertise. As one customer put it recently “You did in one day what would have taken us a year”.

Flexter can process data in real time. As soon as an event happens in the real world it can be analysed instantly with the help of Flexter.

Flexter eliminates project risk. We have seen many XML/JSON conversion projects fail. The failure rate grows exponentially with the complexity of the XML/JSON and the volume of data that needs to be converted.

Flexter is big data ready. We have built scalability into Flexter from the ground up. Flexter can scale up and out across multiple servers. It can handle any volume of data and meet any SLA.

Flexter significantly shortens the duration of XML/JSON conversion projects. Developers can focus on data analytics tasks that add value to the business rather than having to wrangle with XML.

Flexter can meet any service level agreement. In tests it was an astonishing 800 times faster than competing solutions.

Flexter handles different versions of XML/JSON standards gracefully. You can compare different versions of the standard and generate upgrade scripts between different versions of scripts automatically. No coding needed.


The Flexter platform consists of three plugable modules:

Schema Analyser (xsd2er)
Mapping generator (CalcMap)
XML Processor (xml2er)

Step 1: The Schema Analyser is a dedicated module that loads, parses out, processes and stores the XML/JSON schema information in Flexter's internal metadata DB. This step is only required to be performed once for each schema to be processed. You can either supply an XSD or a representative sample of XML/JSON files for this step.

Step 2: Now that we know the exact layout of the source XML/JSON it is possible to generate the relational equivalent. Flexter's module, Mapping Generator generates the output schema layout and the mapping to it. Various optimisations of the target schema can be applied during this step to make the schema more compact.

Step 3: The XML/JSON Processor module takes the information generated from the two previous steps, processes the XML/JSON, and writes the data to the relational target schema.


Yes. Flexter supports very large XML files greater than 1 GB.


The core strength of ETL tools is to transform structured data and work with relational databases. They often struggle with semi-structured data in XML/JSON files. While most ETL tools offer functionality to handle simple and flat XML/JSON files at low volumes, they have serious limitations:

  • They don’t automate the conversion process. ETL developers still need to create data flows (potentially hundreds for complex XMLs) and data pipelines. A significant development effort indeed.
  • ETL tools don’t scale beyond a single server for XML/JSON processing.
  • ETL performance for JSON/XML is poor. We have seen ETL processes running for 22 hours to process a small number of 50K XMLs.
  • Most ETL tools can’t handle XML files in batches. They process XML/JSON files individually, which has a significant impact on performance

Here are two blog posts where we compare Flexter against two popular ETL tools.

Oracle Data Integrator

Informatica


Yes. Flexter supports real-time use cases through its streaming engine.


Yes, we offer version control. With Flexter you can easily identify what has changed between different versions of your XMLs/XSD, e.g. which elements have been added or removed.


We support both individual XML/JSON files and batches of XML/JSON files in archives and compressed formats (zip, gzip etc.).

We can pull XML/JSON files from network drives, (S)FTP servers, HDFS, S3, XMLTYPE/CLOB/BLOB, BJSON in databases etc.
Converting G1 XML to AWS S3


We support any relational databases, e.g. Oracle, MS SQL Server, DB2, PostgreSQL, MySQL, Redshift, Snowflake, BigQuery etc.


We support comma separated and tab separated files as output.
We support Parquet, Avro, and ORC. We also support Hadoop SQL query engines, e.g. Hive, Impala, Drill, AWS Athena etc.
You don’t need an XSD to convert your XML/JSON files. Flexter optionally analyses a sample set of your XML/JSON files to generate a target schema. The advantage of having an XSD is that Flexter can apply better optimisations to your target schema. It will also minimise the issue of constraint violations and other data quality issues.
We generate the target schema based on the information from the XML/JSON, the XSD, or a combination of the two. If you can't provide an XSD we generate the target schema from a statistically significant sample of the XML/JSON files. In summary you have three options to generate the target: (1) XML/JSON only (2) XSD only (3) Combination of XML and XSD.

When we generate the target schema we also provide various optional optimisations, e.g. we can influence the level of denormalisation of the target schema and we may optionally eliminate redundant reference data and merge it into one and the same entity.

This depends on what your requirements are. If you want a more compact schema just use the XML/JSON. Flexter will only consider the XPaths it encounters in the XML/JSON sample files you provide. As a result there will likely be less attributes in the target schema. The downside of this approach is that your XML/JSON sample may not contain all of the possible XPaths of your data set. New and unexpected XPaths will be initially ignored by Flexter and written out as warnings to an alert log. You can gather stats incrementally to cater for those scenarios and evolve your target schema over time.

If you are only using an XSD to generate the target schema all of the possible XPaths will be translated into the target schema. The target schema is more verbose and complex. If the XML files you process conform to your standard then you should not get any warning messages.

However, we often see that XSD designers have been sloppy and do not properly define relationships, cardinality etc. in the XSD. For those scenarios its best to use both the XSD and XML. For gaps and sloppy design in the XSD we override the schema with the stats from the XML sample.

Flexter can gracefully recovers from failure and picks up from where it has left off. Errors are logged in the error log.
Yes, as long as there is some overlap across XML/JSON files. As a rule of thumb 50% overlap is sufficient in most cases.
We offer Flexter as a service and cater for one off migration requirements. Typical scenarios include conversion of Terabytes of historical XML data or a migration from a legacy XML database to a relational database.
Yes. You have various options. (1) You can run multiple schema versions side by side (2) Flexter can easily generate delta code (DDL) to evolve your schema at design time (3) Flexter can generate delta code (DDL) at runtime to evolve your schema in realtime.
Yes. We support popular Big Data formats and Hadoop SQL engines, e.g. Hive, Drill, Impala etc.
Yes, we are now also supporting JSON. We are working hard to liberate data from other complex or proprietary formats such as Excel Spreadsheets, MS Access, EDI etc.
Flexter uses Spark and Spark Streaming as execution engines. It is written in Scala.
Yes, Flexter can be called from the command line or through its RESTful API.
Flexter can be used for batch loading large volumes of XML/JSON files into a data warehouse.

Flexter can be used to trickle feed XML/JSON files in real time to an analytics engine.

Flexter can be used for data exchange scenarios that require translation of XMLs/JSONs to a relational format.

Flexter can be used to migrate large volumes of historic XML/JSON files to a database.

Flexter can be used to migrate an XML database to a relational database.

Yes. Please reach out to us with your specific requirements.
One of our customers is Aer Lingus. They ran into performance and scalability issues with their existing ETL tool.  We implemented our platform Flexter Data Liberator to solve the problem. No custom development was needed. We just installed and configured Flexter and everything was up and running in a day.
Flexter runs on Linux or Windows.
Yes. We have a Docker version of Flexter that we can make available for download. Please reach out to us
Flexter ships as a Windows MSI or Linux RPM installer. The installation and configuration process is simple.
Yes. Flexter can be installed on premise in your own data centre or in a cloud of your choice (Amazon, Azure, Google, Oracle etc.)

https://sonra.io/2018/04/05/converting-xml-tsv-hdinsight/

Yes. Please contact us with your use case.
Yes. You can call Flexter from your ETL tool using the RESTful API or through the command line. We have written up some blog posts that show how Flexter integrates with some popular ETL tools

Oracle Data Integrator

IRI Voracity

Features
Free Online Trial
Enterprise Version
Max daily data limit
50MB
Unlimited
Scalability
Single instance
Cluster of Servers
File output formats
Text
TSV,PARQUET,AVRO,ORC
RDBMS support
 
 
Oracle, MS SQL Server, PostgreSQL
Location
Online
 
On Premise
Online
Data Lineage
Yes
Scheduled execution
Yes
Support
Yes
Optimisations
Elevate, Re-use, Naming
Elevate, Re-use, Naming
Visualisations
 
Browse Schemas
In-memory processing
 
Yes
The Flexter binaries are stored on the edge node. Processing exclusively happens on the data nodes in parallel.

Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.