XML,

How to Parse XML Data in SQL Server

Published on January 25, 2024
Updated on May 27, 2024

The many options available for loading, querying, shredding, and converting XML on SQL Server can often be overwhelming for beginners. SQLXML, XPath, XQuery, OPENXML, FOR XML, OPENROWSET, BULK INSERT are just a few of the many options available for processing XML on SQL Server.

This blog post aims to demystify XML parsing on SQL Server. It outlines the steps of loading, querying, and shredding and gives an in-depth exploration of the various techniques and tools on offer.

We’ll delve into each option, breaking down their intricacies and providing examples and clear guidance on when and how to use them effectively. Furthermore, we’ll shed light on an alternative approach for parsing XML on SQL Server that fully automates the process using XML conversion tools. This approach solves complexity, reduces risk and can deliver more accurate results far faster when done with the right tool. Let’s get started.

XML parsing process on SQL Server

XML parsing refers to the process of interpreting and converting XML data using the features offered by SQL Server.

It involves one or more of the following steps.

  • Loading and Storing XML Data: SQL Server allows storing XML data in its databases. You can store XML data in columns specifically designated as the XML data type. This allows you to store structured data in a way that’s both human-readable and machine-processable. SQL Server includes a special XML data type for loading XML documents.
  • Querying and reading XML Data: SQL Server provides useful tools for reading and querying XML data without the need to store it in a relational format. This approach is known as Schema on Read. For performance optimization, SQL Server allows indexing XML data through a special type of XML index. This is particularly useful when dealing with large volumes of XML data, as it can significantly speed up query performance.
  • Shredding XML Data: SQL Server provides the ability to ‘shred’ XML data – that is, to convert it into relational data. You can extract elements from the XML and represent them as relational rows and columns. This approach of parsing XML is known as Schema on Write and requires an ETL or data transformation process and pipeline. We have written extensively about the limitations of the schema on read approach for hierarchical data.

Constructing XML Data: SQL Server can also generate XML data from relational data. This is done using functions like FOR XML or XQuery, which can output query results in an XML format. Generating XML from a relational format is not covered in this article.

Loading XML to SQL Server

Loading XML documents into SQL Server is a prerequisite for organisations that need to query and shred this data using the SQL Server XML tools. By importing XML into SQL Server, businesses can leverage SQL Server’s powerful querying capabilities to extract valuable insights from XML data, transform it as needed, and integrate it seamlessly with other data types.

In this section, we discuss the diverse and often confusing options available to bulk import XML files into SQL Server. We begin by providing an overview of the XML data type in SQL Server, which is the preferred and optimised method for importing and storing XML data for downstream querying and XML conversion.

While you can map XML files to a relational format during the load phase to SQL Server, we recommend storing the XML document inside an XML data type before shredding the XML to relational tables. This way you preserve the original structure.

You also have the option to query the original XML document, e.g. for debugging purposes.

Overview: XML data type

The XML data type, introduced in SQL Server 2005, represents a fully integrated and premier data type, specifically designed for the manipulation, storage, and querying of XML data.

SQL Server doesn’t store the precise, character-by-character string of the XML data in an XML data type. Rather, it transforms the given literal XML into an internal format akin to XDM (XML Data Model). This format is more abstract and logical, focusing on a hierarchical structure that captures the essential details about XML nodes. However, during this conversion process, some of the metadata associated with the original XML data may be lost.

The XML data type offers numerous benefits compared to the older functions and procedures that were used for managing XML on the server side before the advent of SQL Server 2005.

It offers the following benefits

  • Standardised Data Storage: The XML data type provides a standardised way to store XML data in SQL Server databases. This ensures that the data is stored in a well-structured and easily accessible format.
  • Schema Support: SQL Server’s XML data type supports XML schema collections. This means you can define a schema for your XML data and ensure that the data stored in an XML column adheres to this schema, promoting data integrity and consistency.
  • Integration with Other Data Types: The XML data type allows for seamless integration with other data types in SQL Server. This makes it easier to combine XML data with traditional relational data, facilitating complex data modelling and analysis.
  • Performance Optimization: The XML data type in SQL Server is optimised for performance. It includes features like XML indexing, which can significantly improve query performance on large XML data sets.

When importing XML data into an SQL Server table make sure to set the data type to XML.

Creating a table with XML data type (column University_Data) for loading of XML documents to SQL Server

Option 1: OPENROWSET XML Loading

OPENROWSET is a function in SQL Server that allows you to access data from a remote data source that is not defined as a linked server. This function can also be used to read data from a file, such as an XML file, directly into SQL Server. When using OPENROWSET for loading XML data, you typically specify the BULK option, which is used for reading the content of a file.

When you use OPENROWSET, it treats the file’s contents as a set of rows, similar to how data is structured in a table. Depending on the type of data you are importing, you can specify options like SINGLE_BLOB, SINGLE_CLOB, or SINGLE_NCLOB. These options determine how the file’s contents are returned: as a Binary Large Object (BLOB), Character Large Object (CLOB), or National Character Large Object (NCLOB), all in a single column.

When to use OPENROWSET to load XML data

  • Direct File Access: OPENROWSET allows direct and ad-hoc access to data stored in a file without the need to define a linked server or use any intermediate services.
  • Versatility: OPENROWSET supports a variety of data sources, including OLE DB data sources, which means it can be used to access data from different types of files and external databases, not just XML files.

When not to use OPENROWSET to load XML data

If you are dealing with large or large volumes of XML files, OPENROWSET is not the most efficient in terms of performance. Tools like BCP or BULK INSERT are optimised for handling large volumes of data more efficiently.

Option 2: BCP

The Bulk Copy Program, or BCP for short, is SQL Server’s command-line tool for bulk loading

files into SQL Server tables. BCP can also be used for importing XML files.

Using BCP you can directly load an XML file to a table with an XML data type.

You can use an XML format file together with BCP to specify the format, data types, and XML document to database column mappings. BCP can auto generate and XML format file or you can also create it manually.

Example of and XML format file for loading XML with BCP

Command using BCP for loading an XML file (GEO.xml) with an XML format file (GEO.fmt)

XML format files also work with the T-SQL BULK INSERT statement, which we cover in one of the next sections.

When should you use BCP for importing XML

  • Large XML Files: If you’re dealing with very large XML files, BCP is often the best choice. It’s specifically optimised for handling large volumes of data efficiently, making it more suitable for bulk data loading where performance is a key concern.
  • Automated and Scheduled Data Loads: BCP, being a command-line tool, is ideal for integrating into automated scripts or scheduled tasks. If your data loading process needs to be automated or regularly scheduled, BCP’s ease of integration with batch files or shell scripts makes it a great option.

When should you not use BCP for importing XML

Transactional Control Needed: If you require transactional control over the data loading process (e.g., rollback capabilities), BULK INSERT or OPENROWSET within a transactional SQL script are more suitable for loading XML.

BCP does not check for the validity of an XML file. You would need to preprocess your XML to check that it is well formed or have failure handling processes in place.

BCP is not good at granular error handling.

Option 3: SQLXML Bulk Load

SQLXML added functionality in support of SQL Server’s original implementation of XML

support circa SQL Server 2000. SQLXML allows developers to perform a variety of tasks, including converting relational data to XML format on the client, querying and updating relational

data using XPath and XML, and bulk loading XML data into relational tables.

Although SQLXML was originally designed to work in unmanaged code through OLEDB (Object Linking and Embedding Database), the .NET Framework provides managed wrapper classes for SQLXML.

Before SQL Server 2008 (10.0.x), SQLXML 4.0 was released with SQL Server and was part of the default installation of all SQL Server versions except for SQL Server Express. Starting with SQL Server 2008 (10.0.x), the latest version of SQLXML (SQLXML 4.0 SP1) is no longer included in SQL Server

A useful feature of SQLXML is the Bulk Load feature. This feature is implemented by a stand-alone COM object, which is implemented separately from the other SQLXML functionality. Accessing this functionality from .NET requires you to add a reference to the Microsoft SQLXML BulkLoad 4.0 Type Library COM object. The SQLXML Bulk Load feature is a handy utility for loading XML data into your relational database fairly quickly. Because it’s COM-based, however, you may find it to be less efficient than a native .NET solution.

Option 4: Bulk Insert

Bulk Insert is a T-SQL command that can be used within SQL Server Management Studio (SSMS) or in SQL scripts. It’s executed as a SQL command and is often integrated into SQL scripts or procedures.

BULK INSERT loads data from an XML document into a table. This functionality is similar to that provided by the in option of the bcp command.

Like BCP, BULK INSERT is widely used for importing large volumes of data quickly into SQL Server databases, but they have distinct features and are suited to different scenarios.

When to use Bulk Insert over BCP

  • Transactional Control: BULK INSERT can be incorporated into transactions in T-SQL. This is useful if you need the ability to roll back the import in case of errors.
  • Error Handling: BULK INSERT allows for more sophisticated error handling within T-SQL. You can capture errors and handle them in your scripts, offering better control over the import process.
  • Simplicity in SQL Scripts: For users who are more comfortable with SQL than command-line tools, BULK INSERT offers a simpler and more straightforward approach within a familiar environment.

Option 5: INSERT INTO statement

For individual XML files and copy and paste

Using the INSERT INTO statement to load XML data into SQL Server is useful in specific scenarios, particularly when you have small amounts of XML data. It is perfect for loading individual XML files for development and testing purposes.

Example

Querying XML data in SQL Server

After loading XML data into a table that contains a column of the XML data type, you can begin querying and analysing this data. SQL Server provides extensions like XQuery, which allow for the on-the-fly transformation of XML data into a relational format, eliminating the need to store the data separately in relational tables. This method is commonly known as ‘schema on read’, where the schema is dynamically defined at the time of reading the XML data.

XQuery

XQuery is a language designed for querying XML. It extends XPath, offering enhanced capabilities for iteration, sorting, and even constructing XML. The XQuery standard is developed by the W3C (World Wide Web Consortium), with contributions from Microsoft and other leading RDBMS vendors.

SQL Server ships with XQuery since version 2005. It as an extension to T-SQL for XML parsing and can be used inside T-SQL.

In T-SQL, XQuery interacts with the XML data type through five methods. XQuery also supports XSD schemas for working with complex documents and employs FLWOR (pronounced “flower”) statements. FLWOR stands for ‘for’, ‘let’, ‘where’, ‘order by’, and ‘return’, which are constructs for iterating through nodes, variable binding, filtering with Boolean expressions, ordering nodes, and specifying return values.

Overview of the XQuery functions in SQL Server:

exist(): Checks if an XML document contains a node with a specified value, returning 1 if it exists and 0 if not.

The SELECT query returns the value 1 (true).

modify(): Allows data modification in an XML document, supporting insert, delete, and replace value of operations.

nodes(): Produces a row set from XML instances, useful for shredding XML into relational formats. We will cover this in the next section on XML shredding.

query(): Retrieves a subset of an XML document in XML format.

The SELECT query returns <Product StockItemName=”Chocolate sharks 250g”> <LineItem Quantity=”192″ UnitPrice=”8.55″ /> </Product>

value(): Using this SQL Server XML extraction technique gets the a single scalar value from an XML document, converting it to an SQL Server data type.

The query returns the value Agrita Abele.

The examples have been taken from the highly recommended book SQL Server Advanced Data Types. JSON, XML and beyond

Using XQuery is the recommended approach for querying XML data stored inside an SQL Server XML data type.

Using XPath queries in SQLXML can be a useful tool for remote querying of relational data, although SQL Server’s xml data type and XQuery functionality provide a much more powerful alternative to this particular SQLXML functionality

XML Index

XQuery can be used to query XML stored in an XML data type. The queries can be rather inefficient, however, unless you create XML indexes.

SQL Server offers support for primary XML indexes and three types of secondary XML indexes: PATH, VALUE, and PROPERTY.

XML indexes will outperform full-text indexes for most queries against XML columns.

 A clustered index must exist on the table before you can create an XML index

A primary XML index is a multicolumn clustered index on an internal system table called the Node table. This table stores a shredded representation on the XML objects within an XML column, along with the clustered index key of the base table.

Secondary XML indexes can be established only on XML columns that already have a primary XML index. Internally, these secondary indexes are essentially nonclustered indexes created on the Node table used by SQL Server to store XML data. They are particularly useful in enhancing the performance of queries that rely on specific XQuery processing techniques.

Three types of secondary XML indexes exist

  • PATH Index: This index is constructed on the Node ID and VALUE columns of the Node table. It is beneficial for queries involving path expressions, like those using the exists() XQuery method. A PATH index speeds up queries that navigate specific paths within the XML structure.
  • VALUE Index: This index is structured in the opposite manner, focusing on the VALUE and Node ID columns. It’s advantageous for queries that search for specific values without specifying the XML element or attribute names. This index type is particularly useful for queries that need to find a value regardless of where it appears in the XML document.
  • PROPERTY Index: Built on the clustered index key of the base table, the Node ID, and the VALUE columns, a PROPERTY index is effective when queries require retrieval of nodes across multiple tuples of the column. It’s optimal for queries that need to access various parts of the XML data in relation to the overall database structure.

XML Shredding on SQL Server

XML shredding shreds XML documents into components that can be stored in a relational format on SQL Server. Unlike XML querying, XML shredding uses a schema on write approach. The schema on write paradigm requires a defined schema for the stored data, thereby ensuring consistency, efficient querying, and data integrity.

SQL Server supports XML shredding using OPENXML and XQuery. Let’s look at the advantages and disadvantages of each.

Option 1: OPENXML

The OPENXML function in SQL Server, a tool from SQL Server 2000, allows querying XML documents in a table-like manner and is used for shredding XML data. However, it’s more complex and less efficient compared to the XQuery nodes() method.

OPENXML is based on the COM model and uses the Microsoft XML Core Services library. It allocates a significant portion of SQL Server’s memory (1/8th) for XML processing, regardless of the XML document’s size. In contrast, the XQuery nodes() method, not based on COM, is more memory-efficient.

Usage

One of the limitations with the OPENXML() function is that it does not directly parse XML documents. Rather, the XML document must be pre-parsed prior to invoking this function. This preprocessing step can be accomplished by using the sp_xml_preparedocument system stored procedure. This procedure uses the MSXML parser to process the XML document, resulting in a handle that points to the parsed version of the document, which is then ready for use.

Once the parsing is done and a handle to the parsed XML is obtained, you can then proceed to use the OPENXML() function for further operations.

Example code of using OPENXML

Verdict

While understanding OPENXML is beneficial for dealing with legacy systems or codes, it’s advisable to use the XQuery nodes() method for new developments due to its advantages in simplicity and memory management. The XML features introduced in SQL Server 2005, have largely replaced OPENXML’s functionality.

Option 2: XQuery

To shred data directly from a column, the XQUERY nodes() function is particularly useful for this purpose. It helps in pinpointing the specific nodes in the XML that map to relational columns. You can pair this with the value() method, which is effective in extracting the actual data from these identified nodes. This combination allows for an efficient transformation of XML data into a relational format.

The primary advantage of using the nodes() method compared to OPENXML() lies in its simplicity when applied to a table. You can use the SQL CROSS APPLY operator to efficiently implement the nodes() method across multiple rows within a table. This makes it easy to flatten and denormalise a branch inside an XML.

An example

Create table in SQL Server with XML data type

A multi branched XML is loaded to the table using INSERT INTO…

Validate that the XML was inserted

We flatten the Department branch : The below query and result set shows the data flattened out by department and project

The below query and result set flattens ResearchGroup and Project

The below query and result set flattens out AdministrativeUnit and Service

A note of caution on flattening and denormalising XML data into a relational format.

In relational databases we aim to normalise data. Normalisation is a systematic approach of organising data to reduce redundancy and improve data integrity. The primary goal of normalisation is to separate data into different tables in a way that each table represents one entity type, and then define relationships between these tables.

Verdict

The nodes() method in XQuery is the preferred technique for shredding XML data. However, it’s important to exercise caution when fully flattening XML structures, as this can lead to redundancies and potential errors in your data. Additionally, be mindful when dealing with XML documents that have multiple branches. Fully flattening such documents may not be feasible, due to the complexity of their hierarchical structure. Our enterprise XML converter Flexter ships with optimisation algorithms to find the perfect balance between a normalised and denormalised target model. Try Flexter to convert XML to a relational format

For an extensive discussion of XQuery refer to the excellent book Pro SQL Server 2008 XML

Automating XML parsing on SQL Server

The life cycle for converting XML to a relational database such as SQL Server covers a well defined set of steps.

Once the functional requirements have been defined and specified the following sequence of steps happens:

Analysis to understand the structure of XML and XSD: A data analyst goes through the structure of the different types of XML documents. For more complex projects an XSD is often available. If the XMLs / XSDs are based on an industry data standard the standard will also include documentation. This documentation can often be several hundred pages.

Data modelling: A relational target model is created as DDL. Often data modelling tools such as SQLDBM can be used.

Mapping from XML to relational data model: The XML elements need to be mapped to the relational schema in the SQL database.

XML conversion: In this step you use the tools and approaches such as BCP and XQuery outlined in this blog post to convert the data to the target schema on SQL Server

Error handling and logging: You need to create code to check for invalid XML documents and implement alerting

Documentation: You need to document the code, mappings, target data model and more.

Let’s have a look at how these steps compare between a manual XML conversion approach using coding and an automated approach using an XML conversion tool. For this comparison we use Flexter to illustrate the hands on steps.

You also have the option to download a PDF that provides this side-by-side comparison of manual versus automated XML conversion.

The decision on manual versus automated conversion is a classic decision of buy versus build. Here is a summary of the benefits of an automated approach:

Complexity: An automated approach can handle any XML and XSD complexity quickly and instantly.

Faster go-live: Automated XML parsing automates all of the steps in the XML conversion life cycle, e.g. analysis, schema generation, mappings. Data ends up quickly in the hands of decision makers

Reduces risk: Automation of XML parsing reduces the risk of project failure or budget overruns in complex conversion projects based on industry data standards.

Consistency and accuracy: Automation ensures that XML parsing processes are carried out in a consistent and accurate manner, significantly reducing the risk of human error

Performance: An XML automation tool can scale up and out to handle any SLA and XML data volume.

Ease of Use: User-friendly interfaces in automated tools can make the process more accessible. This reduces dependency on niche XML skills such as XPath, XQuery, XSLT which can be hard to find and can be a point of vulnerability for a project.

The advantages of using an automated conversion process have to be weighed against the additional licence costs for a tool such as Flexter.

For simple requirements or ad-hoc queries an XML automation tool is not required. XML conversion software is a good fit if one or more of the following checks applies to you:

  • Is your XML very complex? Does it use an XSD? Is it based on an industry data standard such as HL7 or FpML?
  • Do you have conversion requirements for many different types of XML?
  • Are your XML files huge?
  • Are you parsing vast volumes of XML data with tight SLAs?
  • Does your team lack in-house expertise in XML technologies, including XQuery, XSLT, and XPath?
  • Do you have tight deadlines for your XML conversion project?

If you answered ‘Yes’ to one or more of these questions, why not give the free online version of our XML converter Flexter a try. See for yourself what XML conversion software can do for you. Alternatively, talk directly to one of our XML conversion experts to discuss your use case in detail.

Further reading

XML conversion

Deep dive on XML converters, XML conversion tools and XML conversion projects

Converting XML and JSON to a Data Lake

How to Insert XML Data into SQL Table?

The Ultimate Guide to XML Mapping in 2024

Optimisation algorithms for converting XML and JSON to a relational format

Re-use algorithms for converting XML and JSON to a relational format

Converting Financial Information eXchange XML (FIXML) to MS SQL Server

Should you Use Outsourced XML Conversion Services?

Flexter

Flexter data sheet

Free online version of Flexter

SQL Server and XML parsing

Examples of bulk import and export of XML documents

Examples of Using XPath Queries

Examples of OPENROWSET

Examples of OPENXML

SQL Server Advanced Data Types. JSON, XML and beyond

Pro SQL Server 2008 XML