XML, XML,

Converting HL7 CDA XML to Snowflake

October 18, 2022

Overview

In this blog post we will automate the conversion process of CDA XML to a relational database format with Flexter .

What is Flexter?

Flexter converts any XML/JSON to a readable format in seconds. Any type. Any size. Any volume. Any target

Projects that can take weeks or months, or never get finished, can be completed in a day or two with Flexter. Flexter requires no coding skills and is a totally automated way to un-silo industry-standard XML data and convert it into a readable database.

What is CDA?

The CDA standard was created to unlock large amounts of data stored in free-text clinical notes, and to enable a comparison of clinical notes generated on different information systems.

There are several guiding principles driving the development of CDA design:

  • The CDA will define documents produced by providers seeing patients and will not define views or downstream uses of those documents.
  • The CDA will facilitate standardisation of thousands of non standardized clinical documents, by allowing cost effective implementation across as wide a spectrum of systems as possible
  • The CDA aims to create a standard that will be application and platform independent, and can be viewed and edited by a number of tools
  • The ability to exchange or store CDA documents will be application- and platform-independent.
  • The CDA will define a mechanism that allows local implementations to represent information that is not formally represented in the standard.

The CDA is a document markup standard that specifies the structure and meaning of “clinical documents”. A clinical document is a documentation of observations and services provided.

Converting CDA data with Flexter

Converting CDA XML data can be performed in a couple of simple steps

Step 1 – Define Source data, collect Statistics (information such as data types, constraints, and relationships) and create Data Flow (Mapping data points in the XML source elements to the data points in the relational target schema)

Step 2 – Define Source Schema (XSD)

Step 3 – Convert the XML documents

Step 1 – Define Source, collect statistics and create data flow

In this step we will read XML data, collect metadata, and create a data flow (a logical target schema and the mappings from source to target). CDA XML data can be found here.

Example of output

Step 2 – Define Source Schema (XSD)

In the next step we define the source schema. The CDA 2.0 XSD can be found here.

Example of output

Step 3 – Convert data

In the final step we convert the data to the output folder which we created. We also define the output format, e.g. TSV, ORC, Parquet, Avro files, or a relational database such as Snowflake.

Example of output:

ER Diagram

Er Diagram can be found here.

Conclusion

We have masked XML data and processed it with ease. We did in a matter of minutes what would normally take a few days.

Ralph Kimball the father of dimensional modelling and data warehousing already knew:

“Because of such inherent complexity, never plan on writing your own XML processing interface to parse XML documents.

The structure of an XML document is quite involved, and the construction of an XML parser is a project in itself—not to be attempted by the data warehouse team.”

Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.

You can also request a demo of Flexter or reach out to us directly.

Enjoyed this post? Have a look at the other posts on our blog.

Contact us for Snowflake professional services.

In this video, we use Flexter to automatically convert very complex FpML XML to Snowflake tables. Book a demo to see the power of Flexter in action!