Convert ESMA XML to Snowflake

Anvesh Gali XML

Introduction to this walkthrough

In this walkthrough, we will demonstrate the process of loading XML data into Snowflake – a cloud based data warehousing service. We have downloaded the XML files from ESMA and then converted them into TSV files using Flexter, a powerful XML parser from Sonra. We then load the TSV files into Snowflake and execute queries.

XML data from ESMA

We have sourced the data from ESMA. ESMA is an independent EU Authority that contributes to safeguarding the stability of the European Union's financial system by enhancing the protection of investors and promoting stable and orderly financial markets. The data includes the reporting specifications and transactions. The market related rules and regulations for transaction reporting and the transaction details are recorded as part of ESMA.

Processing the XML file

We will be using the free version of Flexter to convert the XML data to Snowflake. Since Snowflake cannot process XML files directly, we use Flexter to first convert our XML data to text (TSV). It will then be easy to load the data into Snowflake.

Convert ESMA XML to Text Using Flexter

Flexter is a tool that can transform complex XMLs into readable files that can be used for analysis. The step by step instructions for transforming XML files using Flexter are shown below.

Screen Clipping

  • Click on the ‘Terms and Conditions’ checkbox and click ‘Try Flexter for Free’

Screen Clipping

  • Upload the XML file that needs to be transformed and click ‘Continue’

Screen Clipping

  • On the next page, we can upload the XSD file (if available). For this implementation, the XSD file is not available and hence we will click on ‘Skip ‘and proceed to the next page.

Screen Clipping

  • Provide Email and click ‘Continue’
  • On the next page, click on ‘Finish’ and the output from Flexter will be delivered to the subscribed email address.

Screen Clipping

  • An email from Flexter with the link to the transformed data is obtained. The TSV files can be downloaded from this link which redirects to the Flexter page.

Screen Clipping

Screen Clipping

  • For the XML files from ESMA, a total of 28 TSV files were generated and downloaded.

Screen Clipping

About Snowflake.net

Snowflake is a data warehouse service that provides users with the interface and the tools required to create a data warehouse based upon the business requirements. Snowflake can also be termed as a cloud native data warehouse optimised for the cloud. In the following sections, we describe the step by step instructions to setup a snowflake user account and create a database for the ESMA data.

Setup a Snowflake account

The following steps highlight the setting up of a snowflake account.

Screen Clipping

  • Click on Try now to get started
  • Select the Edition as ‘Basic’ for the free trial and the region as required and click ‘Continue’

Screen Clipping

  • Provide the payment information and click on ‘Create account’

Screen Clipping

Screen Clipping

  • The Snowflake account is created. We can access this account after 24 hours.

Screen Clipping

Screen Clipping

You will receive an email from Snowflake within 24 hours with the account activation link.

Loading ESMA data to Snowflake

In the next step, we will be loading the TSV output from Flexter into Snowflake. In the following sections, we have explained the process of setting up the Snowflake console and loading data.

Snowflake provides the users with a friendly UI which make it easy to load data and query. The Flexter output obtained from the XML files contained a total of 28 TSV files that are to be loaded into the bucket. The following steps describe the data loading process into Snowflake:

Creating a warehouse

The warehouse contains the database and the file formats for a specific data set.

  • Go to the Snowflake console.
  • Click on Warehouses

Screen Clipping

  • Click on create warehouse. Specify the warehouse name as Flexter_Output.
  • Click on create.

Creating a database

In the next step, we will be creating a database for the warehouse.

  • Click on databases.

Screen Clipping

  • Click on create database
  • A new popup is displayed. Specify the database name as FLEXTER_OUTPUT_ESMA and click on Finish

Screen Clipping

  • The newly created database clan be seen under the list of databases.

Screen Clipping

 

Creating a File Format

The file format needs to be created for the specific type of file that is to be loaded into the database. In this case, the file format is being created for the TSV output from Flexter. We have described the process of creating a file format as follows.

  • Click on the database ‘FLEXTER_OUTPUT_ESMA’

Screen Clipping

  • Click on FileFormat

Screen Clipping

  • Click on Create. A new pop up is displayed.

  • Specify the name as ‘DATALOAD_TSV’, format type as CSV, column separator as Tab (since we are uploading a TSV file) and the header rows to skip as ‘1’

  • Click on Finish.

Screen Clipping

Creating a table

  • Click on the database ‘FLEXTER_OUTPUT_ESMA’

Screen Clipping

  • Click on Tables->Create
  • A new pop up for creating a table is displayed. Specify the table name as ‘AIFMFivePrincipalMarket’
  • The schema name is ‘Public’
  • Specify the schema (columns and data types as necessary)

Screen Clipping

  • Click on Finish. The new table can be seen under the database ‘FlexterOutput’

Screen Clipping

In the next steps, we will be loading the data into the newly created tables.

Loading the data

  • Select the database ‘FLEXTER_OUTPUT_SCHEMA’
  • Click on the table ‘AIFMFIVEPRINCIPALMARKET’

Screen Clipping

  • Click on load table
  • In the pop up, select the warehouse as ‘Flexter_Output’

Screen Clipping

  • Specify the location of the files to be loaded.

Screen Clipping

  • Select the file format as ‘Download_TSV’

Screen Clipping

  • Specify the load options to configure the steps to be followed in case of errors

Screen Clipping

  • Click on load

Screen Clipping

  • The load results after completion of data load is displayed.

Screen Clipping

  • In a similar manner, load the data for the other 28 tables.

Screen Clipping

Sample queries in Snowflake

Snowflake follows standard SQL dialect. Queries can be executed from the Worksheet tab on the console.

Query 1

OUTPUT:

Screen Clipping

Query 2

OUTPUT:

Screen Clipping

Query 3

OUTPUT:

Screen Clipping

Query 4

OUTPUT:

Screen Clipping

Query 5

OUTPUT:

Screen Clipping