Converting CDISC XML data to a database (Snowflake)

October 24, 2019

In this post we will guide you through the challenging process of obfuscating and converting CDISC XML data to Snowflake. We will be using Sonra’s masking tool Paranoid and processing and parsing tool Flexter.

CDISC

The Clinical Data Interchange Standards Consortium (CDISC) is a standards developing organization (SDO) dealing with medical research data linked with healthcare, to “enable information system interoperability to improve medical research and related areas of healthcare”.
CDISC standards are harmonized through a model that is also a HL7 standard and is in the process of becoming an ISO/CEN standard.

Masking CDISC XML

Now that we have introduced the tools we are using, we will start masking our XML data.
In a first step we will be masking our CDISC XMLs with Paranoid. You can find how to install Paranoid in our Masking Sabre xml post ( don’t worry it only takes a couple of steps to install it )


This will mask all of the values of the XML document. Optionally Paranoid has the feature to mask individual elements inside an XML document.
Let’s have a look at our file after masking

Now we can start going through a few more steps and convert CDISC XML data to a relational format in Snowflake.

Snowflake

Snowflake is an analytic data warehouse provided as SaaS. It runs on cloud infrastructure, and all of its services are running on a public cloud infrastructure. Snowflake data warehouse uses a combination of SQL database engine and one of a kind architecture designed for the cloud.
Snowflake enables you to scale up or down with ease, due to separation of storage and compute. It can do even heavy workloads at unbelievable speed. Some of the strong points of Snowflake are:

  • Uncompromising Simplicity
  • Unlimited Concurrency
  • Breathtaking Performance

Processing masked XML with Flexter

Flexter exposes its functionality through a RESTful API. Converting XML/JSON to Snowflake can be done in a few simple steps.
Step 1 – Authenticate
Step 2 – Define Source Connection (Upload or S3) for Source Data (JSON/XML)
Step 3 – Optionally define Source Connection (Upload or S3) for Source Schema (XSD)
Step 4 – Define your Target Connection, e.g. Snowflake, Redshift, SQL Server, Oracle etc.
Step 5 – Convert your XML/JSON from Source to Target Connection

Step 1 – Authenticate

To get an access_token you need to make a call to /oauth/token with Authorization header and 3 form parameters:

  • username=YOUR_EMAIL
  • password=YOUR_PASSWORD
  • grant_type=password

You will get your username and password from Sonra when you sign up for the service.

Example of output

Step 2 – Define Source Connection (Upload) for Source Data (CDISC XML)

In this step we upload our CDISC XML Source data

Example of output

Step 3 – Define Target Connection (Snowflake)

Since we don’t have a Source Schema (XSD) we skip the optional step of defining a Source Schema.
We define our Target connection. We give the Target Connection a name and supply various connection parameters to the Snowflake database.

Example of output

Step 4 – Convert XML data from Source Connection (Upload) to Target Connection (Snowflake)

In last step we will convert XML data. Data will be written directly to Snowflake Target Connection.

Example of output

Example of ER Diagram

We can create and download an ER Diagram of the model that Flexter generated by making a GET call.

Example of output


You can download the ER Diagram of our CDISC XML file here.
Next we will run an SQL Query where we will select subject level information with the most frequently recorded type of item groups (ITEMGROUPOID) for the first, the third or the fifth measurements (ITEMGROUPREPEATKEY)

Conclusion

And we are finished with this “long and hard process” :-). We have managed to complete a couple of tasks in a few minutes that normally take hours or days.
Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.
If you have any questions please refer to the Flexter FAQ section. You can also request a demo of Flexter or reach out to us directly.
Enjoyed this post? Have a look at the other posts on our blog.
Contact us for Snowflake professional services.
We created the content in partnership with Snowflake.

In this video, we use Flexter to automatically convert very complex FpML XML to Snowflake tables. Book a demo to see the power of Flexter in action!