XML,

How to Insert XML Data into SQL Table?

Maciek
by Maciek

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.


Published on December 22, 2023
Updated on December 30, 2024

You have 3 options for inserting XML data into an SQL table.

  • Schema on read: Import the XML document into an SQL database table without converting the data to a relational format. The data retains its hierarchical structure as XML. It just sits inside an XML or CLOB data type in an SQL table. Consumers can write special SQL code to query the XML data. This type of SQL code is proprietary to the database. It is not part of the ANSI standard for SQL.
  • Schema on write: Convert the data inside the XML documents to database SQL relational tables. Once the XML has been parsed and converted to relational tables it can be queried by standard SQL.
  • Automated XML conversion: The most advanced method to convert XML into an SQL table involves the use of enterprise-level XML conversion software, which automates all conversion steps. This approach overcomes the drawbacks associated with manual schema-on-read and schema-on-write approaches. It eliminates the requirement for engineers to write extensive lines of XML conversion code, a process that can be extremely time-consuming and leads to delays in data access.

The first two are traditional approaches. Both commonly fail with complex XML conversion projects.

In traditional approaches, data engineers write code to convert XML to a relational format. This works well for simple XML documents and low or medium XML data volumes. It often fails for complex scenarios based on elaborate industry data standards such as FpML, HL7, ISO 20022, FHIR etc. Projects often run over budget or fail completely. Going live takes months or even years. Deadlines are missed. Solutions struggle to cope with data volumes. Handling change is a manual effort.

Here, we’re going to look at these in more detail to give you the context you need to upload XML data into an SQL table yourself, or transform your XML conversion strategy with an automated approach.

Traditional methods for performing XML conversions within an SQL database are limited to databases that offer built-in support for this task. However, it’s important to note that the majority of databases lack native support for XML conversion using SQL.

Databases with XML support: Oracle, SQL Server, Snowflake, PostgreSQL

Databases without native XML support: BigQuery, Azure Synapse, MySQL, Redshift.

The three XML to SQL table options

You can either implement your own XML conversion code and manually convert XML to SQL tables or you can buy an XML conversion tool to get the job done. It’s the classic choice between build and buy.

The decision depends on a couple of major and minor factors. The most important variable is the complexity of your XML files. Other factors include the number of XML schemas, availability of XML skills, size of XML files, volume of XML files,

For a more detailed discussion check this article on build versus buy for XML conversion.

Let’s go through the options in more detail and outline the advantages and disadvantages of each.

Schema on read: Load XML file into SQL table

Schema on read is an approach for querying XML files within a database where data is ingested without enforcing a fixed structure, allowing for flexible storage of diverse XML documents. Instead of defining a schema upfront, the structure is determined at query time, enabling queries to extract specific elements or attributes using query languages like XPath or XQuery.

Some of the bigger databases such as Oracle, SQL Server, Snowflake have native support for querying XML inside a database. They also have special data types such as XML (SQL Server) or XMLTYPE (Oracle).

Example: Inserting XML document to an SQL table using SQL Server.

As you can see we created a table with a special XML data type ([University_Data] [xml]). It is specific to SQL Server. Some of the bigger databases such as Oracle have similar data types to store XML. The advantage of this data type is that it can store the original XML document.

The following SQL query will extract the university data from the XML document. It uses special SQL syntax to query the XML.

I have summarised the steps to apply schema on read for XML conversion to an SQL table

  • Data analysis: XML documents need to be analysed for querying the data
  • XML file upload: XML data is imported into a database without enforcing a predefined schema during insertion. XML documents are stored as-is, typically as special XML data types, text or binary data, without requiring a fixed data structure.
  • Schema Definition at query time: Instead of defining a schema upfront, the schema for the XML data is determined dynamically at the time of querying. When performing a query, you specify the structure and elements you’re interested in, and the database system interprets the XML data accordingly.
  • Data mapping: Mapping XML elements to table columns is performed on the fly as part of querying the data.
  • Querying: Queries against XML data in a schema on read database are typically written in SQL with query extensions like XPath or XQuery. These proprietary extensions allow you to navigate and extract specific elements or attributes from the XML documents.

Schema on write: XML to SQL conversion as relational format

In the schema on write context, XML data is directly converted into a predefined relational schema format within the database. This conversion ensures that data conforms to the specific structure, data types, and constraints established by the schema. The data is inserted into the database, and the schema enforces validation rules. Queries are performed on data that has already been converted and structured according to the schema, ensuring performance, consistency and predictability in data retrieval.

Below is the relational schema of the XML we used in the previous section.

It can be easily queried using SQL.

This query returns the following result

UniversityId

FacultyId

TUD

Engineering

I have summarised the steps of XML conversion in the context of schema on write:

  • Data Analysis: Data analysis is needed to identify how XML data elements should be structured as a relational schema.
  • Schema Definition: Based on the output of the analysis a predefined schema is created as DDL before data insertion into the database. This data model specifies the structure, data types, and constraints for the converted XML data.
  • Data Mapping: Data is mapped to the corresponding elements and attributes of the predefined schema.
  • Load XML file into SQL table: The XML documents are loaded into an SQL table for further processing. This is similar to schema on write but the next step converts the XML to a relational format.
  • Database Insertion: Once the data has been successfully converted into an XML format, it is inserted into the database. The database system verifies that the incoming XML documents conform to the defined schema, and any non-conforming data is typically rejected.
  • Querying and Retrieval: Queries are performed on data that has already been converted and structured according to the schema.

Automated XML conversion to SQL tables

The most cutting-edge XML conversion method completely streamlines and automates the process of XML conversion to a database by using an XML conversion tool. This tool automates the insertion of XML data into SQL, resembling the concept of XML-as-schema-on-write, but with a key distinction – all these tasks are executed automatically.

By utilising XML conversion software, the need for manual involvement in tasks such as data analysis, schema creation, data mapping, error management, data insertion, documentation, data lineage tracking, and schema evolution is replaced with a fully automated approach.

I’ve summarised the differences between the automated XML parsing method and the manual approach that relies on native SQL XML extensions within a database.

 

Schema on read XML conversion

Schema on write XML conversion

Automated XML to SQL conversion

Schema type

Schema on read

Data is queried infrequently, e.g. data discovery

Very high frequency of schema changes

Schema on write

Data is queried frequently, e.g. dashboards

Medium frequency of schema changes

Automated schema on write

Data is queried frequently, e.g. dashboards

Medium frequency of schema changes

Data analysis

Manual

Manual

Automated

Schema definition

Manual

Manual

Automated

Error handling

Manual

Manual

Automated

Schema optimisation

No optimisation

No optimisation

Fully Optimised database schema

Data mapping

Manual

On the fly

Ad-hoc query

Manua

As part of ETL

Automated

Data insertion

N/A

Manual

Automated

Documentation

Manual

Manual

Automated

Data lineage

Manual

Manual

Automated

Querying data

Writing queries against XML data can be more complex and less intuitive than querying data with a predefined schema

Easy to query relational output with standard SQL

Easy to query relational output with standard SQL

Schema evolution and code refactoring

Manual

Manual

Semi-automated:

Metadata catalogue

Schema diff

Auto-generated DML and DDL

Go-live

Slow

Slow

Fast

Complexity

Suitable for simple scenarios

Low complexity of XML

Suitable for simple scenarios

Low – medium complexity

Suitable for all scenarios

Particularly useful for medium and high XML complexity and XML based on industry data standards

Conversion type

Proprietary feature of advanced databases

Proprietary feature of advanced databases

XML converter

Data quality

Low

Data validation becomes the responsibility of the query writer.

High

Data is validated

High

Data is validated

Conversion effort

Very high

High

Low

Query performance

Slow

Fast

Fast

Support for very large XML files (> 1 GB)

No

No

Yes

Scalability

Depends on database

Depends on database

Yes.

Scales up and out

Database support

Some, e.g. Oracle

It is worth noting that most databases do not have native SQL support for XML.

Any

XSD support

Generally no

Some databases have very basic XSD support

Yes

Getting hands on with automated XML conversion

At this point you are probably curious to see some examples of automated XML conversion to SQL.

When it comes to automated XML conversion, there is only really one tool on the market – Flexter. We built it at Sonra specifically to solve the challenges of traditional approaches.

Automatically converting XML to SQL tables typically involves two steps.

Step 1: Create Data Flow

In the first step we create a Data Flow from one of the following

  • XSD. Flexter has full support for the XSD specification and can work with any type of XSD
  • a sample of XML
  • XSD and sample of XML files. This third option will give you the best results

Flexter does not require an XSD to create the schema in the database. The data flow generates the logical target schema and the mappings. This information is stored in the Flexter MetaDB. Data Flow creation is a one off process. It is only repeated to evolve the schema if something changes.

Flexter also works with JSON. The same steps apply. Flexter first analyses a sample of JSON files, generates the relational schema, the mappings from source to target, and the documentation.

Step 2: Convert XML

In the second step Flexter converts XML to a relational database, e.g. Oracle, SQL Server, Snowflake or any other supported database or file based format.

As part of the conversion process Flexter monitors any incoming XML for schema changes or unexpected XPaths, writes warnings to the error log and send alerts.

We have created a video to show the XML conversion process to SQL tables in Snowflake. We use FpML as an XML source.

FpML is an industry data standard in finance. It is one of the most complex XML schemas. The XML and XSD may generate thousands of tables depending on which part of the FpML standard you convert.

The principles are the same for any other supported target or any industry data standard other than FpML.

How automated XML conversion accelerates efficiency, ensures accuracy, and cuts costs

An automated approach to XML conversion has some key benefits over the traditional approach of writing code to convert XML to a relational database and SQL tables.

  • Efficiency and Time-Saving: This is probably the most significant benefit. Automated conversion processes are much faster to implement. Your project will go live much quicker and your data engineers can focus on providing real value and insights.
  • Accuracy and Consistency: Automation reduces the likelihood of human error, ensuring that the data remains consistent and accurate throughout the conversion process. This is crucial for maintaining data integrity.
  • Cost-Effectiveness: Although there might be an initial investment in setting up an automated conversion system, it can be more cost-effective in the long run. It reduces the need for extensive manual data engineering labour and the associated costs.
  • Scalability: Automated systems can easily handle large and increasing volumes of data. They can manage larger datasets without a proportional increase in effort or cost.
  • Error Detection and Correction: Many automated systems include features for error detection and correction, which can further ensure the quality of the converted data.
  • Easy to Update and Maintain: Once set up, automated conversion processes are typically easier to update and maintain, especially as they can be adjusted to accommodate changes in data structure or format requirements.

When should you use an automated approach for converting XML?

We’ve shown how cool Flexter is at turning XML into Spark Parquet automatically. But let’s be real, Flexter isn’t the perfect fit for everything. When thinking about using an automated tool like Flexter, you’ve got to balance the cool perks against the extra dollars it costs.

If your needs are simple or you’re just dabbling with XML every now and then, you probably don’t need a fancy XML automation tool. But, here are some signs that XML conversion software like Flexter might just be what you need:

  • Got super complicated XML that’s using an XSD or follows strict industry rules like HL7 or FpML?
  • Need to deal with lots of different XML types?
  • Working with XML files so big they’re practically novels?
  • Need to chew through tons of XML data fast because of tight deadlines?
  • Is your team more “XML? What’s that?” than “XML pros”?
  • Are you working under tight deadlines for converting XML data?

If any of these apply to you, why not take Flexter for a spin with its free online version? See for yourself how it can make your life easier. Or, if you want to get down to the nitty-gritty, talk to one of our XML conversion experts about your use case.

Further Reading

Deep dive on XML converters, XML conversion tools and XML conversion projects

Converting XML and JSON to a Data Lake

How to Insert XML Data into SQL Table?

The Ultimate Guide to XML Mapping in 2024

Optimisation algorithms for converting XML and JSON to a relational format

Maciek

About the author:

Maciek

Co-founder of Sonra

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.