Overview

In this post we will show you how you can easily convert Workday XML or any other XML to a Snowflake database. We will use Sonra’s Paranoid to mask the XML data and Flexter to convert Workday XML automatically without any manual coding.

What is Flexter?

Flexter converts any XML/JSON to a readable format in seconds. Any type. Any size. Any volume. Any target

Projects that can take weeks or months, or never get finished, can be completed in a day or two with Flexter. Flexter requires no coding skills and is a totally automated way to un-silo industry-standard XML data and convert it into a readable database.

Flexter is unique in data management and data warehousing software in that it completely automates the process of data conversion.

Specialist skills are often required and need to be hired at costly rates.

That slows down the flow of information that data analysts need and, higher up the chain, it impacts important decision-making.

Flexter changes all that, freeing your time and budget up to focus on providing real value to the business and getting your data rapidly into a readable and analysable database format.

Without the need to write a single line of code.

What is Paranoid?

Paranoid is a powerful open source command line tool that masks and obfuscates XML and JSON files. It can target a specific xpath or mask entire files without any coding. It is best used in combination with Flexter.

What is Workday?

Workday is a cloud human resources management system. It combines human capital management, enterprise resource management, and financial management applications into one system. It is designed for medium to large companies thanks to its features:

  • human resource management (HRM)
  • talent management
  • payroll management
  • time tracking
  • data analysis
  • And more

What is Snowflake?

Snowflake is a cloud data platform for data warehousing, data lakes, data engineering, data science, data application development, and secure sharing and consumption of real-time / shared data. It is a fully managed solution provided as a SaaS, with support for ANSI SQL. It lets you focus on querying data, while reducing your time needed to invest into administration or DBA activities.

Masking Workday XML

We will mask Workday XML data with one quick command. If needed we can specify which xpath we can mask, but for this use case we will mask the entire XML file.

Converting Workday XML with Flexter

We will convert XML/JSON to Snowflake in a few simple steps.

Step 1 – Authenticate

Step 2 – Create a Data Flow from source file

Step 3 – Generate schema (target data model) / optional

Step 4 – Define your target connection, e.g. Snowflake

Step 5 – Process your XML/JSON data

Let’s go through these steps one by one and process our data.

Step 1 – Authenticate

To get an access_token you need to make a call to /oauth/token with Authorization header and 2 form parameters:

  • username=YOUR_EMAIL
  • password=YOUR_PASSWORD

Step 2 – Create a data flow

The Data Flow creates the logical target schema and the mappings from your source XML or JSON elements to the table columns in your target schema.

For XML you can create a data flow from

  • A source schema (XSD)
  • A sample of source data (a sample of XML documents)
  • A combination of source schema (XSD) and source data (a sample of XML documents).

For JSON you can create a data flow from a sample of source data only (a sample of JSON documents). We currently don’t support JSON schema.

Output

Step 4 – Create target connection

Before you create a Conversion you need to create a Target Connection. A Target Connection is a reusable location for your target schema.

Output

Step 5 – Process your XML/JSON data

Once you have created a Data Flow you can use it to convert XML or JSON documents in a Conversion.

Mandatory parameters

The name of the Data Flow and the name of the Source Connection for the Conversion Data are mandatory parameters.

Optional parameters

The name of a Target Connection is optional. If you don’t provide a Target Connection, FaaS will generate a download link for your data in the response.

If you select a Target Connection of type S3 or of type Download you can also optionally provide the type of file format, i.e. CSV, TSV, Parquet, ORC, Avro.

If you don’t specify a file format for Target Connection of type S3 or Download, FaaS will generate TSV files as the default type.

There are also some other optional parameters that you can specify when you create a Conversion.

  • Prefix. When you specify a prefix all of the target table names or file names will be prefixed with the value you specify.
  • Suffix. When you specify a suffix all of the target table names or file names will be suffixed with the value you specify.
  • By default a Conversion appends data to your Target Schema. You can optionally overwrite the data in your Target Schema when executing a Conversion.

The tables, columns, and relationships are generated in your Target Schema the first time you execute a Conversion.

You can re-use a Data Flow with different Target Connections when you create a Conversion, e.g. you can use the Data Flow to first convert data to Oracle and then to SQL Server.

Output

Conclusion

We have masked XML data and processed it with ease. We did in a matter of minutes what would normally take a few days.

Ralph Kimball the father of dimensional modelling and data warehousing already knew:

“Because of such inherent complexity, never plan on writing your own XML processing interface to parse XML documents.

The structure of an XML document is quite involved, and the construction of an XML parser is a project in itself—not to be attempted by the data warehouse team.”

You can try our Flexter online.

Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.

You can also request a demo of Flexter or reach out to us directly.

Enjoyed this post? Have a look at the other posts on our blog.

Contact us for Snowflake professional services.