XML,

Home » XML » Converting Complex XML to CSV: A Practical Guide

Converting Complex XML to CSV: A Practical Guide

by Maciek

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.

Published on November 28, 2016
Updated on March 24, 2025

Have you ever tried to process data in XML?

Did you have to load its values into a relational database or simply convert the XML to plain CSV?

What if I told you that there is a simple way of processing complex XMLs — including solving some misconceptions around the specifics of XML?

I’m here to help. Keep reading to find out how you can practically convert complex XML to CSV better and more efficiently.

XML to CSV converter myth

Let’s first cover what XML is actually used for.

You might think, “This is just another structured text file, like CSV so it should be straightforward… Ultimately, the difference is you have the values inside their own tag delimiters rather than just commas.”

Right?

Not really, I’m afraid.

XML can indeed accommodate flat data structures. However, that leaves out the remaining 99.5% of cases where XML is utilised because we need to represent a complex structure that CSV can’t handle.

Unfortunately, in the majority of cases you will probably deal with complex scenarios that involve hierarchical data structures and multiple one-to-many relationships all being stored in a single XML file.

Think of it as a relational database that has been encapsulated into a single text file.
Hypothetically, if you have two <supplier> child entries from and at the same time, 3 independent <customer< span=””>> children of the same root — how would you flatten this out?</customer<>

Those data points are different. They are independent XPaths and represent one-to-many child relationships. Extracting them together would result in a cartesian product:

company	vat_id	supplier	customer
ACME	EU123	sA	cK
ACME	EU123	sA	cL
ACME	EU123	sA	cM
ACME	EU123	sB	cK
ACME	EU123	sB	cL
ACME	EU123	sB	cM

Notice that the company’s attributes are repeated, which also creates redundant entries. These are just two of the limitations of flattening XML to CSV.

You’re likely now thinking: “OK. But what now? Surely there must be a better way to dump all of this into a basic format like CSV.”

You’re dead right. The answer is quite simple – rather than trying to squeeze everything in a single file, instead normalize and split it into one file per independent “branch.”

A loosely defined branch could be any close combination of values and attributes that are all linked together with a strict 1:1 relationship.

This process is common in the database world and is called normalisation. In this particular example, it would produce a simple tree of three files:

The company would act as a root or header file, storing one record per each processed XML. It would be accompanied with two other files….
Supplier storing two records and…
Customer with three records, pretty much reflecting the relationships from the origin XML file.

Here’s a context-aware lead-in, perfect for this post’s flow:

🎥 Convert Complex XML to CSV with Flexter – No Coding Needed

Struggling to convert XML files with nested structures into clean CSV or TSV files? Before we jump into the enterprise version of Flexter, here’s a quick demo of Flexter Online – the no-code version that lets you convert XML to CSV in seconds.

✔️ Normalises complex XML structures
✔️ Splits data into logical files (no more cartesian products!)
✔️ Ideal for quick conversions and testing

We have also compiled a comparison guide on XML to CSV converters that would be helpful for you.

XML processing with Flexter in action

Now it’s time to show you the right way to convert XML files to CSV with a practical example. We will work with the enterprise version of Flexter, demonstrating its features and showing you some of its internals. The steps to process XML files through Flexter are:

Create a Data Flow Input/Output analysis (just once!)
1. Process and analyze the XSD schema
2. Calculate the target layout and generate source to target mappings

Process the XML data files (at a repeatable basis)

For this use case, I have picked the Pain.001 (Payments Initiation but can be a real pain too 😉) schema that shows Credit Transfer messages in XML format.

This is part of a wider ISO 20022 standard that has been already put in place to process payments in the EU/SEPA (pre-Single Euro Payments Area) zone. The standard describes dozens of different financial messages and there’s a pretty good chance that you’ll come across it one day, especially if you’re working within IT in the banking sector.
Once we have downloaded both XML and XSD files we can start processing.

Analysing input XML layout

We first use Flexter to analyze the input schema:

1	xsd2er -R /mapr/dev.sonra.io/tmp/sepa/pain.001.001.03.xsd

The -R option tells Flexter to use the smart detection feature of root elements in the XSD (so there’s no need to specify the name explicitly).

It only takes two seconds, but prints out a lot of useful information on the screen. You may access the full log here but the most interesting part is below:

18:18:49.098 INFO Registering new job

18:18:49.352 INFO Parsing pain.001.001.03.xsd

18:18:49.580 INFO Linking pain.001.001.03.xsd

18:18:49.603 INFO Root elements:

pain.001.001.03.xsd

- Document

18:18:49.607 INFO Building metadata

18:18:49.686 INFO Writing metadata

18:18:50.048 INFO Registering success of job 3748

18:18:50.074 INFO Finished successfully in 1294 milliseconds

# schema

schemaXml: 1496

org: 1

job: 3748

elements: 940

# statistics

load: 452 ms

parse: 372 ms

build: 83 ms

write: 362 ms

After deep analysis of the XSD schema, Flexter now knows how all potential source XML are structured.

This information is saved into Flexter’s metadata for subsequent use. We can see from the above log that Flexter found only one possible root element (“Document”) in our XSD and there were no other XSD files referenced. The XML schema has been resolved into 940 possible tag elements.

You can access the full layout here.

The internal Flexter ID for our input schema is 1496.

Building the target output layout

In the next step, we calculate the normalized target layout and the mapping to connect the source layout to it. We pass the internal ID of our schema (here 1496) to the Flexter CalcMap tool:

1	calcmap -x 1496 {true, false}

During run-time, it prints out the log information on the particular phases of the analysis. At the end, we can see a short summary:

1	Tables/Columns/Mappings: 58/780/1024

What we can observe here is target layout optimisation in action. Flexter ships with various optimisation algorithms that simplify the target schema.

Based on the insight Flexter gathered in the first Schema Analysis phase, it can optimise the normalisation process, resulting in fewer potential target files. This means Flexter was able to compact all possible data points from the input XML into just 58 files with a total of 780 columns.

Processing XML data

Now we are ready to process our XML files. We pass the internal schema ID to the Flexter xml2er command line tool:

1	xml2er -x 1496 tmp/sepa/pain.001.001.03.xmls.zip

This produces our output in TSV format (excerpt):

# input

path: maprfs:///mapr/dev.sonra.io/tmp/sepa/pain.001.001.03.xmls.zip

# output (default)

path: maprfs:///mapr/dev.sonra.io/tmp/sepa/out.opt/

format: tsv

savemode: overwrite

18:19:48.775 INFO Registering new job

18:19:49.109 INFO Loading metadata

[Stage 1:=> (7 + 1) / 200]

[Stage 1:==> (10 + 1) / 200]

[Stage 1:====> (17 + 1) / 200]

[Stage 1:======> (23 + 1) / 200]

[Stage 1:========> (30 + 1) / 200]

[Stage 1:==========> (38 + 1) / 200]

[Stage 1:============> (44 + 1) / 200]

[Stage 1:==============> (53 + 1) / 200]

[Stage 1:=================> (62 + 1) / 200]

[Stage 1:===================> (71 + 1) / 200]

[Stage 1:======================> (79 + 1) / 200]

[Stage 1:========================> (89 + 1) / 200]

[Stage 1:===========================> (99 + 1) / 200]

[Stage 1:=============================> (108 + 1) / 200]

[Stage 1:=================================> (120 + 1) / 200]

[Stage 1:====================================> (133 + 1) / 200]

[Stage 1:========================================> (146 + 1) / 200]

[Stage 1:==========================================> (154 + 2) / 200]

[Stage 1:=============================================> (165 + 1) / 200]

[Stage 1:================================================> (178 + 1) / 200]

[Stage 1:====================================================> (192 + 1) / 200]

18:20:01.750 INFO Parsing data

18:20:07.297 INFO calculating data

18:20:07.303 INFO writing tables

18:20:07.305 INFO writing table GrpHdr_InitgPty_Id_OrgId_Othr

18:20:07.763 INFO writing table GrpHdr_InitgPty_Id_PrvtId_Othr

18:20:08.001 INFO writing table Document

18:20:08.238 INFO writing table CdtTrfTxInf_Cdtr_Id_OrgId_Othr

18:20:08.407 INFO writing table CdtTrfTxInf

18:20:08.669 INFO writing table Cdtr_PstlAdr_AdrLine

18:20:08.865 INFO writing table Strd

18:20:09.050 INFO writing table Ustrd

18:20:09.214 INFO writing table UltmtCdtr_Id_OrgId_Othr

18:20:09.390 INFO writing table PmtInf_Dbtr_Id_OrgId_Othr

18:20:09.557 INFO writing table PmtInf_Dbtr_PstlAdr_AdrLine

18:20:09.722 INFO writing table PmtInf

18:20:09.913 INFO writing table PmtInf_UltmtDbtr_Id_OrgId_Othr

18:20:10.340 INFO calculating statistics

18:20:10.683 INFO writing statistics rows

18:20:11.513 INFO Registering success of job 3749

18:20:11.531 INFO Finished successfully in 47601 milliseconds

# schema

xml: 1496

logical: 419

org: 1

job: 3749

xpaths: 515

# statistics

load: 37820 ms

parse: 5547 ms

write: 2775 ms

stats: 1440 ms

The default output mode is TSV which, in its structure, is very similar to CSV but the delimiter is its tab character (tab separated file). This tends to be a more universal standard as it generates fewer potential issues in subsequent use (commas can generally also occur in text and number formatting.)

As you might have noticed, Flexter only produced 13 files. The default operation mode of Flexter is to create only tables where data that was present in the input XML. This resulted in only 13 out of 58 files being created and populated.

→ ls -l

-rwxr-xr-x 1 flexter 109 Nov 17 14:08 Authstn.tsv

-rwxr-xr-x 1 flexter 13279 Nov 17 19:20 CdtTrfTxInf.tsv

-rwxr-xr-x 1 flexter 98 Nov 17 19:20 CdtTrfTxInf_Cdtr_Id_OrgId_Othr.tsv

-rwxr-xr-x 1 flexter 283 Nov 17 14:08 CdtrSchmeId_Id_PrvtId_Othr1.tsv

-rwxr-xr-x 1 flexter 123 Nov 17 19:20 Cdtr_PstlAdr_AdrLine.tsv

-rwxr-xr-x 1 flexter 3215 Nov 17 19:20 Document.tsv

-rwxr-xr-x 1 flexter 11336 Nov 17 14:08 DrctDbtTxInf.tsv

-rwxr-xr-x 1 flexter 354 Nov 17 19:20 GrpHdr_InitgPty_Id_OrgId_Othr.tsv

-rwxr-xr-x 1 flexter 91 Nov 17 19:20 GrpHdr_InitgPty_Id_PrvtId_Othr.tsv

-rwxr-xr-x 1 flexter 5902 Nov 17 19:20 PmtInf.tsv

-rwxr-xr-x 1 flexter 85 Nov 17 19:20 PmtInf_Dbtr_Id_OrgId_Othr.tsv

-rwxr-xr-x 1 flexter 104 Nov 17 19:20 PmtInf_Dbtr_PstlAdr_AdrLine.tsv

-rwxr-xr-x 1 flexter 93 Nov 17 19:20 PmtInf_UltmtDbtr_Id_OrgId_Othr.tsv

-rwxr-xr-x 1 flexter 1711 Nov 17 19:20 Strd.tsv

-rwxr-xr-x 1 flexter 98 Nov 17 19:20 UltmtCdtr_Id_OrgId_Othr.tsv

-rwxr-xr-x 1 flexter 607 Nov 17 19:20 Ustrd.tsv

Please note those 58 files were already compacted, so you can see now the full path it took: from 940 tag elements to 58 potential target files to just 13 instance files. Pretty good huh?

Suggested reading: Add to your understanding on XML with our deep dive on XML conversion: including converters, tools, and projects.

See the benefits of Flexter for yourself

I hope you enjoyed this brief overview and I encourage you to try out Flexter in your own time.

In our next several posts, we will look at more demanding schemas and also demonstrate how to integrate Flexter with other data integration tools. Stay tuned!

Which data formats apart from XML are you struggling with and need help with? We could give you the support you need — just reach out!

About the author:

Maciek

Co-founder of Sonra

Follow Maciek:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

Converting Complex XML to CSV: A Practical Guide

XML to CSV converter myth

🎥 Convert Complex XML to CSV with Flexter – No Coding Needed

XML processing with Flexter in action

Analysing input XML layout

Building the target output layout

Processing XML data

See the benefits of Flexter for yourself

Maciek

Converting Complex XML to CSV: A Practical Guide

XML to CSV converter myth

🎥 Convert Complex XML to CSV with Flexter – No Coding Needed

XML processing with Flexter in action

Analysing input XML layout

Building the target output layout

Processing XML data

See the benefits of Flexter for yourself

Related Articles

Cookies consent