A brief history of XML – From hype to useful data format

Published on October 18, 2016
Updated on July 26, 2024

Is XML really dead?

When it first became popular about 20 years ago, XML was meant to be the one and only format to serialize, encapsulate, and exchange data. The serialization format to end all serialization formats so to speak. This was a bold claim. Has it materialised? Over the last couple of years it has become clear that this bid for “world power” was a bridge too far. For exchanging simple pieces of information XML is just too verbose. Developers hate it. JSON now has taken the place of XML as the serialization format of choice on the web. Most if not all REST web services have switched to JSON. This makes perfect sense. There are just too many tags with XML, which slows it down somewhat. JSON is just a better fit for serializing a programming language object.
What about data analytics? During the hype days, some people even thought that XML would replace relational databases. What retrospectively looks like a bad joke was quite a serious proposition at the time. There were countless books on the subject and a few attempts to create XML databases were made. It became quite quickly clear that XML was not fit for purpose in those scenarios. Querying XML with XPath is an absolute pain. Just compare this to SQL. There is no way of leveraging indexes or a cost based optimizer and you have to load the whole XML document into memory for query operations to be efficient. Apart from relational databases we now also have some open source columnar compressed data formats such as Parquet or ORC that are a much better fit for data analytics than XML.
[flexter_banner]

Has XML failed?

It is one thing to say that XML has not delivered on its promises, yet another one altogether to claim that it has failed or to say it is dead. Yes, it is not a good fit for exchanging data on the web neither is it a good fit for data analytics. However, there are countless examples where XML is used successfully to this day. What the story of XML tells us is that there is not one data serialization format to rule them all. We now have many formats at our disposal. Avro, Thrift, Protocol Buffers to name just a few. For a full list and description have a look at this Wikipedia article https://en.wikipedia.org/wiki/Comparison_of_data_serialization_formats. Each of these serves its own use case (well, actually some of those in the Wiki article are really obsolete).

XML Success Stories

What are some use cases where XML succeeded?

A significant number of enterprises use XML as a data exchange format. XML is the de facto standard for exchanging messages between enterprise applications in a Services Oriented Architecture. Messages that conform to the canonical model are converted back and forth to XML. If you have ever worked in an enterprise context you know that life there isn’t simple and neither is the type you of data and its relationships you come across. This is an environment where XML shines as a data format with an extensible schema to represent complex business processes in the real world.
Business processes between enterprises are more than ever inter-connected at a global scale. B2B data hubs often standardise on XML as their data exchange format.
Many industry standards have evolved over the years that are based on XML. Years of work and expertise have gone into these standards. In particular this is the case in finance (ESMA TRACE, MIFID, XBRL) retail, healthcare (HL7), life sciences (CDISC), and public sector (EU) just to name a few.
XML is used as a serialization format for RDFs (RDF/XML) in a semantic web context.
In the publishing industry, XML is used throughout the document processing work flow. It is also the standard for Office file formats such as Word, Excel, PowerPoint or the Google Docs equivalents.
XML is widely used for geographic annotations, e.g. KML. OpenStreetMap extensively uses XML for data exports.

XML = Pain

We have seen that XML can be quite useful and has found its own niches. The initial hype did not materialise. While not being ubiquitous it is still used widely, in particular in an enterprise context where things can get complex. As we all know, when things get complex things get difficult.
In theory XML is human readable. Unfortunately, we don’t see that this is the case in practice (well, only for the most simple XML files found in configuration files or similar). XML schemas (XSDs) can become quite complex. We have seen XSDs that literally contain hundreds of entities/tables. When we visualise those schemas they look like the schema of a complex ERP system reminding one of a spider’s web. This complexity makes it very hard for data analysts and developers to work with. The man days spent on analyzing and processing XML exponentially increase with the complexity of the XSD. A factor that compounds the problem is that most XSDs have not been designed with analytics in mind, e.g. transactions come with redundant reference data, real world relationships are not modelled correctly etc.
So what are your options for processing complex XML files into a relational format or a Big Data format such as Parquet/ORC (both formats are fit for data analytics)?

You can hire a bunch of developers and data analysts that try to make sense of the complex schema and try to manually extract the data from the XML by writing custom code. If you have an ETL tool you will find out sooner or later that it can’t handle the complexity of most industry standards or that it only semi-automates the process or that performance is shockingly bad.
You can use Flexter Data Liberator for XML. Do in one day what your developers/ETL tool would do in six months (if at all). Don’t worry about data volume, SLAs or performance. Flexter scales linearly. End of story.

We understand that you are sick of working with XML. Why not try out Flexter to find out how much fun it can be to process XML. Flexter is our platform that takes the pain out of converting XML files into a relational format or Parquet.
[flexter_button]
Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

A brief history of XML – From hype to useful data format

Is XML really dead?

Has XML failed?

XML Success Stories

XML = Pain

A brief history of XML – From hype to useful data format

Is XML really dead?

Has XML failed?

XML Success Stories

XML = Pain

Related Articles

XML Conversion Using Python in 2024

Best Way to Load & Convert XML Data to Oracle Tables

9 Critical Types of XML Tools for Developers

Cookies consent