Converting Duck Creek XML to Snowflake
Overview
In this blog post we will automate the conversion process of Duck Creek XML to a relational database format with Flexter . In this example we convert to Snowflake but Flexter supports any relational database or data warehouse platform.
Table of Contents
What is Flexter?
Flexter converts any XML/JSON to a readable format in seconds. Any type. Any size. Any volume. Any target
Projects that can take weeks or months, or never get finished, can be completed in a day or two with Flexter. Flexter requires no coding skills and is a totally automated way to un-silo industry-standard XML data and convert it into a readable database.
What is Duck Creek?
Duck Creek is a P&C insurance platform that provides solutions to core day-to-day operations, like billing, claims, policy, insights and rating.
Converting Duck Creek data with Flexter
Converting Duck Creek XML data can be performed in a couple of simple steps
Step 1 – Define Source data, collect Statistics (information such as data types, constraints, and relationships) and create Data Flow (Mapping data points in the XML source elements to the data points in the relational target schema)
Step 2 – Convert the XML documents
Note: In this particular instance we did not use a schema (XSD) to convert the data. Flexter can work with an XSD, a sample of XML files or both (best of the two worlds).
Step 1 – Define Source, collect statistics and create data flow
In this step we will read XML data, collect metadata, and create a data flow (a logical target schema and the mappings from source to target). Duck Creek XML data can be found here.
1 |
xml2er -g1 /DuckCreek.zip |
Example of output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# schema origin: 8748 logical: 3254 job: 53642 # statistics startup: 1895 ms load: 160 ms xpath stats: 9474 ms doc stats: 1609 ms parse: 123 ms write: 9232 ms xpaths: 35 | map:0%/0 new:100%:35 documents: 2 | suc:100%/2 part:0%/0 fail:0%/0 size:4.2KB |
Step 2 – Convert data
In the final step we convert the data to the output folder which we created. We also define the output format, e.g. TSV, ORC, Parquet, Avro files, or a relational database such as Snowflake.
1 |
xml2er -l3254 /DuckCreek.zip -S o “snowflake://https://eusyda1312.eu-central-1.snow flakecomputing.com/?warehouse=ware&db=db&schema=DuckCreek” -U duck -P creek |
Example of output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
18:21:13.316 INFO Registering success of job 62 18:21:13.331 INFO Finished successfully in 6894 milliseconds # schema origin: 8749 logical:3255 job: 53643 # statistics startup: 458 ms load: 154 ms xpath stats: 3489 ms doc stats: 2489 ms parse: 1897 ms write: 284 ms xpaths: 35 | map:100%/35 new:0%/0 documents: 2 | suc:100%/2 part:0%/0 fail:0%/0 size:4.2KB |
ER Diagram
Let’s zoom into the ER diagram
Er Diagram can be found here.
Conclusion
We have masked XML data and processed it with ease. We did in a matter of minutes what would normally take a few days.
Ralph Kimball the father of dimensional modelling and data warehousing already knew:
“Because of such inherent complexity, never plan on writing your own XML processing interface to parse XML documents.
The structure of an XML document is quite involved, and the construction of an XML parser is a project in itself—not to be attempted by the data warehouse team.”
Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.
You can also request a demo of Flexter or reach out to us directly.
Enjoyed this post? Have a look at the other posts on our blog.
Contact us for Snowflake professional services.