Flexter Enterprise XML Converter - FAQ

Vadim Mytarev Flexter, XML

Flexter is an enterprise XML converter. It automatically converts XML data that is locked away in complex industry data standards to text, a database, or Hadoop. It can also convert any proprietary XML files.

Flexter completely automates the whole process of converting XML files to a relational format. No custom development is needed. This can save up to 80% of the overall conversion costs. You don’t need to hire external consultants with XML expertise. As one customer put it recently “You did in one day what would have taken us a year”.

Flexter eliminates project risk. We have seen many XML conversion projects fail. The failure rate grows exponentially with the complexity of the XML and data volumes.

We have built scalability into Flexter from the ground up. Flexter can scale up and out across multiple servers. It can handle any volume of data and meet any SLA.

Flexter significantly shortens the duration of XML projects. Developers can focus on data analytics tasks that add value to the business rather than having to wrangle with XML.


The Flexter platform consists of three pluggable modules:

Schema Analyser (xsd2er)
Mapping generator (CalcMap)
Xml Processor (xml2er)

Step 1: The Schema Analyser is a dedicated module that loads, parses out, processes and stores the XML schema information in Flexter's internal metadata DB. This step is only required to be performed once for each schema to be processed. You can either supply an XSD or a representative sample of XML files for this step.

Step 2: Now that we know the exact layout of the source XML it is possible to generate the relational equivalent. Flexter's module, Mapping Generator generates the output schema layout and the mapping to it. Various optimisations of the target schema can be applied during this step.

Step 3: The XML Processor module takes the information generated from the two previous steps, processes the XML, and writes the data to the relational target schema.


The core strength of ETL tools is to transform structured data and work with relational databases. They often struggle with semi-structured data in XML files. While most ETL tools offer functionality to handle simple and flat XML files at low volumes, they have serious limitations:

  • They don’t automate the conversion process. ETL developers still need to create data flows (potentially hundreds for complex XMLs) and data pipelines. A significant development effort indeed.
  • ETL tools don’t scale beyond a single server for XML processing.
  • Most ETL tools can’t handle XML files in batches. They process XML files individually, which has a significant impact on performance

Here are two blog posts where we compare Flexter against two popular ETL tools.

Oracle Data Integrator

Informatica


Yes. Flexter supports real-time use cases through its streaming engine.


We support both individual XML files and batches of XML files in archives and compressed formats (zip, gzip etc.).

We can pull XML files from network drives, (S)FTP servers, HDFS, S3, XMLTYPE/CLOB in databases etc.


We support most relational databases, e.g. Oracle, MS SQL Server, DB2, PostgreSQL, MySQL, Redshift, Snowflake, BigQuery etc.


We support comma separated and tab separated files as output.
We support Parquet, Avro, and ORC. We also support Hadoop SQL query engines, e.g. Hive, Impala, Drill, AWS Athena etc.
You don’t need an XSD to convert your XML files. Flexter analyses a sample set of your XML files to generate a target schema. The advantage of having an XSD is that Flexter can apply better  optimisations to your target schema. It will also minimise the issue of constraint violations and other data quality issues.
Flexter can gracefully recovers from failure and picks up from where it has left off. Errors are logged in the error log.
Yes, as long as there is some overlap across XML files. As a rule of thumb 50% overlap is sufficient in most cases.
We offer Flexter as a service and cater for one off migration requirements. Typical scenarios include conversion of Terabytes of historical XML data or a migration from a legacy XML database to a relational database.
With Flexter it’s easy to run multiple versions of your schema/XSD side by side as no extra development is required. You can easily evolve your schema over time and run multiple XSD versions in parallel at no extra development costs.
Yes. We support popular Big Data formats and Hadoop SQL engines, e.g. Hive, Drill, Impala etc.
Yes, we are working hard to liberate data from other complex or proprietary formats such as JSON, Spreadsheets, MS Access, EDI etc.
Flexter uses Spark and Spark Streaming as execution engines. It is written in Scala.
Yes, Flexter can be called from the command line or through its RESTful API.
Flexter can be used for batch loading large volumes of XML files into a data warehouse.

Flexter can be used to trickle feed XML files in real time to an analytics engine.

Flexter can be used for data exchange scenarios that require translation of XMLs to a relational format.

Flexter can be used to migrate large volumes of historic XML files to a database.

Flexter can be used to migrate an XML database to a relational database.

Yes. Please reach out to us with your specific requirements.
One of our customers is Aer Lingus. They ran into performance and scalability issues with their existing ETL tool.  We implemented our platform Flexter Data Liberator to solve the problem. No custom development was needed. We just installed and configured Flexter and everything was up and running in a day.
Flexter runs on Linux or Windows.
We are working on a downloadable personal edition of Flexter. Please reach out to us to be notified when it becomes available.
Flexter ships as a Windows MSI or Linux RPM installer. The installation and configuration process is simple.
Yes. Flexter can be installed on premise in your own data centre.
Yes. Please contact us with your use case.
Yes. You can call Flexter from your ETL tool using the RESTful API or through the command line. We have written up some blog posts that show how Flexter integrates with some popular ETL tools

Oracle Data Integrator

IRI Voracity

Informatica Data Services

Features
Free Online Trial
Enterprise Version
Max daily data limit
50MB
Unlimited
Scalability
Single instance
Cluster of Servers
File output formats
Text
TSV,PARQUET,AVRO,ORC
RDBMS support
 
 
Oracle, MS SQL Server, PostgreSQL
Location
Online
 
On Premise
Online
Data Lineage
Scheduled execution
Support
Optimisations
Elevate, Re-use, Naming
Elevate, Re-use, Naming
Visualisations
 
Browse Schemas
In-memory processing
 
Yes

Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.