Flexter Enterprise XML Converter - FAQ

Vadim Mytarev Flexter, XML

Flexter is an enterprise XML converter. It automatically converts XML data that is locked away in complex industry data standards to text, a database, or Hadoop. It can also convert any proprietary XML files.

Flexter completely automates the whole process of converting XML files to a relational format. No custom development is needed. This can save up to 80% of the overall conversion costs. You don’t need to hire external consultants with XML expertise. As one customer put it recently “You did in one day what would have taken us a year”.

Flexter eliminates project risk. We have seen many XML conversion projects fail. The failure rate grows exponentially with the complexity of the XML and data volumes.

We have built scalability into Flexter from the ground up. Flexter can scale up and out across multiple servers. It can handle any volume of data and meet any SLA.

Flexter significantly shortens the duration of XML projects. Developers can focus on data analytics tasks that add value to the business rather than having to wrangle with XML.

The Flexter platform consists of three pluggable modules:

Schema Analyser (xsd2er)
Mapping generator (CalcMap)
Xml Processor (xml2er)

Step 1: The Schema Analyser is a dedicated module that loads, parses out, processes and stores the XML schema information in Flexter's internal metadata DB. This step is only required to be performed once for each schema to be processed. You can either supply an XSD or a representative sample of XML files for this step.

Step 2: Now that we know the exact layout of the source XML it is possible to generate the relational equivalent. Flexter's module, Mapping Generator generates the output schema layout and the mapping to it. Various optimisations of the target schema can be applied during this step.

Step 3: The XML Processor module takes the information generated from the two previous steps, processes the XML, and writes the data to the relational target schema.

The core strength of ETL tools is to transform structured data and work with relational databases. They often struggle with semi-structured data in XML files. While most ETL tools offer functionality to handle simple and flat XML files at low volumes, they have serious limitations:

  • They don’t automate the conversion process. ETL developers still need to create data flows (potentially hundreds for complex XMLs) and data pipelines. A significant development effort indeed.
  • ETL tools don’t scale beyond a single server for XML processing.
  • Most ETL tools can’t handle XML files in batches. They process XML files individually, which has a significant impact on performance

Here are two blog posts where we compare Flexter against two popular ETL tools.

Oracle Data Integrator


Yes. Flexter supports real-time use cases through its streaming engine.

We support both individual XML files and batches of XML files in archives and compressed formats (zip, gzip etc.).

We can pull XML files from network drives, (S)FTP servers, HDFS, S3, XMLTYPE/CLOB in databases etc.

We support most relational databases, e.g. Oracle, MS SQL Server, DB2, PostgreSQL, MySQL, Redshift, Snowflake, BigQuery etc.

We support comma separated and tab separated files as output.
We support Parquet, Avro, and ORC. We also support Hadoop SQL query engines, e.g. Hive, Impala, Drill, AWS Athena etc.
You don’t need an XSD to convert your XML files. Flexter analyses a sample set of your XML files to generate a target schema. The advantage of having an XSD is that Flexter can apply better  optimisations to your target schema. It will also minimise the issue of constraint violations and other data quality issues.
We generate the target schema based on the information from the XML, the XSD, or a combination of the two. If you can't provide an XSD we generate the target schema from a statistically significant sample of the XML files. In summary you have three options to generate the target: (1) XML only (2) XSD only (3) Combination of XML and XSD.

When we generate the target schema we also provide various optional optimisations, e.g. we can influence the level of denormalisation of the target schema and we may optionally eliminate redundant reference data and merge it into one and the same entity.

This depends on what your requirements are. If you want a more compact schema just use the XML. Flexter will only consider the XPaths it encounters in the XML sample files you provide. As a result there will likely be less attributes in the target schema. The downside of this approach is that your XML sample may not contain all of the possible XPaths of your data set. New and unexpected XPaths will be initially ignored by Flexter and written out as warnings to an alert log. You can gather stats incrementally to cater for those scenarios and evolve your target schema over time.

If you are only using an XSD to generate the target schema all of the possible XPaths will be translated into the target schema. The target schema is more verbose and complex. If the XML files you process conform to your standard then you should not get any warning messages.

However, we often see that XSD designers have been sloppy and do not properly define relationships, cardinality etc. in the XSD. For those scenarios its best to use both the XSD and XML. For gaps and sloppy design in the XSD we override the schema with the stats from the XML sample.

Flexter can gracefully recovers from failure and picks up from where it has left off. Errors are logged in the error log.
Yes, as long as there is some overlap across XML files. As a rule of thumb 50% overlap is sufficient in most cases.
We offer Flexter as a service and cater for one off migration requirements. Typical scenarios include conversion of Terabytes of historical XML data or a migration from a legacy XML database to a relational database.
With Flexter it’s easy to run multiple versions of your schema/XSD side by side as no extra development is required. You can easily evolve your schema over time and run multiple XSD versions in parallel at no extra development costs.
Yes. We support popular Big Data formats and Hadoop SQL engines, e.g. Hive, Drill, Impala etc.
Yes, we are working hard to liberate data from other complex or proprietary formats such as JSON, Spreadsheets, MS Access, EDI etc.
Flexter uses Spark and Spark Streaming as execution engines. It is written in Scala.
Yes, Flexter can be called from the command line or through its RESTful API.
Flexter can be used for batch loading large volumes of XML files into a data warehouse.

Flexter can be used to trickle feed XML files in real time to an analytics engine.

Flexter can be used for data exchange scenarios that require translation of XMLs to a relational format.

Flexter can be used to migrate large volumes of historic XML files to a database.

Flexter can be used to migrate an XML database to a relational database.

Yes. Please reach out to us with your specific requirements.
One of our customers is Aer Lingus. They ran into performance and scalability issues with their existing ETL tool.  We implemented our platform Flexter Data Liberator to solve the problem. No custom development was needed. We just installed and configured Flexter and everything was up and running in a day.
Flexter runs on Linux or Windows.
We are working on a downloadable personal edition of Flexter. Please reach out to us to be notified when it becomes available.
Flexter ships as a Windows MSI or Linux RPM installer. The installation and configuration process is simple.
Yes. Flexter can be installed on premise in your own data centre.
Yes. Please contact us with your use case.
Yes. You can call Flexter from your ETL tool using the RESTful API or through the command line. We have written up some blog posts that show how Flexter integrates with some popular ETL tools

Oracle Data Integrator

IRI Voracity

Informatica Data Services

Free Online Trial
Enterprise Version
Max daily data limit
Single instance
Cluster of Servers
File output formats
RDBMS support
Oracle, MS SQL Server, PostgreSQL
On Premise
Data Lineage
Scheduled execution
Elevate, Re-use, Naming
Elevate, Re-use, Naming
Browse Schemas
In-memory processing

Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.