Overview
In this blog post we will automate the conversion process of CDA XML to a relational database format with Flexter .
What is Flexter?
Flexter converts any XML/JSON to a readable format in seconds. Any type. Any size. Any volume. Any target
Projects that can take weeks or months, or never get finished, can be completed in a day or two with Flexter. Flexter requires no coding skills and is a totally automated way to un-silo industry-standard XML data and convert it into a readable database.
What is CDA?
The CDA standard was created to unlock large amounts of data stored in free-text clinical notes, and to enable a comparison of clinical notes generated on different information systems.
There are several guiding principles driving the development of CDA design:
- The CDA will define documents produced by providers seeing patients and will not define views or downstream uses of those documents.
- The CDA will facilitate standardisation of thousands of non standardized clinical documents, by allowing cost effective implementation across as wide a spectrum of systems as possible
- The CDA aims to create a standard that will be application and platform independent, and can be viewed and edited by a number of tools
- The ability to exchange or store CDA documents will be application- and platform-independent.
- The CDA will define a mechanism that allows local implementations to represent information that is not formally represented in the standard.
The CDA is a document markup standard that specifies the structure and meaning of “clinical documents”. A clinical document is a documentation of observations and services provided.
Converting CDA data with Flexter
Converting CDA XML data can be performed in a couple of simple steps
Step 1 – Define Source data, collect Statistics (information such as data types, constraints, and relationships) and create Data Flow (Mapping data points in the XML source elements to the data points in the relational target schema)
Step 2 – Define Source Schema (XSD)
Step 3 – Convert the XML documents
Step 1 – Define Source, collect statistics and create data flow
In this step we will read XML data, collect metadata, and create a data flow (a logical target schema and the mappings from source to target). CDA XML data can be found here.
1 |
xml2er -g1 /CDA.zip |
Example of output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# schema origin: 5964 logical: 2405 job: 22128 # statistics startup: 1895 ms load: 160 ms xpath stats: 9474 ms doc stats: 1609 ms parse: 123 ms write: 9232 ms xpaths: 35 | map:0%/0 new:100%:35 documents: 2 | suc:100%/2 part:0%/0 fail:0%/0 size:4.2KB |
Step 2 – Define Source Schema (XSD)
In the next step we define the source schema. The CDA 2.0 XSD can be found here.
1 |
xsd2er -a5964 -g1 /CDA_schema.zip |
Example of output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
# schema origin: 5965 logical: 2406 job: 22129 # statistics load: 1608 ms stats: 40 ms parse: 368 ms build: 121 ms write: 47 ms map: 128 ms xpaths: 16 |
Step 3 – Convert data
In the final step we convert the data to the output folder which we created. We also define the output format, e.g. TSV, ORC, Parquet, Avro files, or a relational database such as Snowflake.
1 |
xml2er -l2406 /CDA.zip -S o “snowflake://https://eusyda1312.eu-central-1.snow flakecomputing.com/?warehouse=ware&db=db&schema=CDA” -U cda -P cda |
Example of output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
18:21:13.316 INFO Registering success of job 62 18:21:13.331 INFO Finished successfully in 6894 milliseconds # schema origin: 5966 logical: 2407 job: 22130 # statistics startup: 458 ms load: 154 ms xpath stats: 3489 ms doc stats: 2489 ms parse: 1897 ms write: 284 ms xpaths: 35 | map:100%/35 new:0%/0 documents: 2 | suc:100%/2 part:0%/0 fail:0%/0 size:4.2KB |
ER Diagram
Er Diagram can be found here.
Conclusion
We have masked XML data and processed it with ease. We did in a matter of minutes what would normally take a few days.
Ralph Kimball the father of dimensional modelling and data warehousing already knew:
“Because of such inherent complexity, never plan on writing your own XML processing interface to parse XML documents.
The structure of an XML document is quite involved, and the construction of an XML parser is a project in itself—not to be attempted by the data warehouse team.”
You can try our Flexter online.
Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.
You can also request a demo of Flexter or reach out to us directly.
Enjoyed this post? Have a look at the other posts on our blog.
Contact us for Snowflake professional services.