Introduction to this walkthrough
In this walkthrough, we will demonstrate the process of loading XML data into Snowflake – a cloud based data warehousing service. We have downloaded the XML files from ESMA and then converted them into TSV files using Flexter, a powerful XML parser from Sonra. We then load the TSV files into Snowflake and execute queries.
XML data from ESMA
We have sourced the data from ESMA. ESMA is an independent EU Authority that contributes to safeguarding the stability of the European Union’s financial system by enhancing the protection of investors and promoting stable and orderly financial markets. The data includes the reporting specifications and transactions. The market related rules and regulations for transaction reporting and the transaction details are recorded as part of ESMA.
Processing the XML file
We will be using the free version of Flexter to convert the XML data to Snowflake. We use Flexter to first convert our XML data to text (TSV). It will then be easy to load the data into Snowflake.
Convert ESMA XML to Text Using Flexter
Flexter is a tool that can transform complex XMLs into readable files that can be used for analysis. The step by step instructions for transforming XML files using Flexter are shown below.
- Access Flexter from https://xmlconverter.sonra.io/
- Click on the ‘Terms and Conditions’ checkbox and click ‘Try Flexter for Free’
- Upload the XML file that needs to be transformed and click ‘Continue’
- On the next page, we can upload the XSD file (if available). For this implementation, the XSD file is not available and hence we will click on ‘Skip ‘and proceed to the next page.
- Provide Email and click ‘Continue’
- On the next page, click on ‘Finish’ and the output from Flexter will be delivered to the subscribed email address.
- An email from Flexter with the link to the transformed data is obtained. The TSV files can be downloaded from this link which redirects to the Flexter page.
- For the XML files from ESMA, a total of 28 TSV files were generated and downloaded.
About Snowflake.net
Snowflake is a data warehouse service that provides users with the interface and the tools required to create a data warehouse based upon the business requirements. Snowflake can also be termed as a cloud native data warehouse optimised for the cloud. In the following sections, we describe the step by step instructions to setup a snowflake user account and create a database for the ESMA data.
[cloud_book_banner]
Setup a Snowflake account
The following steps highlight the setting up of a snowflake account.
- Go to Snowflake free trial
- Click on Try now to get started
- Select the Edition as ‘Basic’ for the free trial and the region as required and click ‘Continue’
- Provide the payment information and click on ‘Create account’
- The Snowflake account is created. We can access this account after 24 hours.
You will receive an email from Snowflake within 24 hours with the account activation link.
Loading ESMA data to Snowflake
In the next step, we will be loading the TSV output from Flexter into Snowflake. In the following sections, we have explained the process of setting up the Snowflake console and loading data.
Snowflake provides the users with a friendly UI which make it easy to load data and query. The Flexter output obtained from the XML files contained a total of 28 TSV files that are to be loaded into the bucket. The following steps describe the data loading process into Snowflake:
Creating a warehouse
The warehouse contains the database and the file formats for a specific data set.
- Go to the Snowflake console.
- Click on Warehouses
- Click on create warehouse. Specify the warehouse name as Flexter_Output.
- Click on create.
Creating a database
In the next step, we will be creating a database for the warehouse.
- Click on databases.
- Click on create database
- A new popup is displayed. Specify the database name as FLEXTER_OUTPUT_ESMA and click on Finish
- The newly created database clan be seen under the list of databases.
Creating a File Format
The file format needs to be created for the specific type of file that is to be loaded into the database. In this case, the file format is being created for the TSV output from Flexter. We have described the process of creating a file format as follows.
- Click on the database ‘FLEXTER_OUTPUT_ESMA’
- Click on FileFormat
- Click on Create. A new pop up is displayed.
- Specify the name as ‘DATALOAD_TSV’, format type as CSV, column separator as Tab (since we are uploading a TSV file) and the header rows to skip as ‘1’
- Click on Finish.
Creating a table
- Click on the database ‘FLEXTER_OUTPUT_ESMA’
- Click on Tables->Create
- A new pop up for creating a table is displayed. Specify the table name as ‘AIFMFivePrincipalMarket’
- The schema name is ‘Public’
- Specify the schema (columns and data types as necessary)
- Click on Finish. The new table can be seen under the database ‘FlexterOutput’
In the next steps, we will be loading the data into the newly created tables.
Loading the data
- Select the database ‘FLEXTER_OUTPUT_SCHEMA’
- Click on the table ‘AIFMFIVEPRINCIPALMARKET’
- Click on load table
- In the pop up, select the warehouse as ‘Flexter_Output’
- Specify the location of the files to be loaded.
- Select the file format as ‘Download_TSV’
- Specify the load options to configure the steps to be followed in case of errors
- Click on load
- The load results after completion of data load is displayed.
- In a similar manner, load the data for the other 28 tables.
Sample queries in Snowflake
Snowflake follows standard SQL dialect. Queries can be executed from the Worksheet tab on the console.
Query 1
1 |
SELECT * FROM BORROWINGSOURCE limit 10; |
OUTPUT:
Query 2
1 2 3 4 |
SELECT ENTITYIDENTIFICATIONBIC, COUNT(ENTITYIDENTIFICATIONBIC) FROM COUNTERPARTYTOFUNDEXPOSURE GROUP BY ENTITYIDENTIFICATIONBIC ; |
OUTPUT:
Query 3
1 2 3 4 |
SELECT MARKETCODE, AVG(AGGREGATEDVALUEAMOUNT) AS AverageAGGREGATEDVALUEAMOUNT FROM AIFPRINCIPALMARKET GROUP BY MARKETCODE; |
OUTPUT:
Query 4
1 2 3 4 5 |
SELECT ENTITYIDENTIFICATIONBIC, MAX(RANKING) AS MaxRank, MIN(RANKING) AS MinRank FROM BORROWINGSOURCE GROUP BY ENTITYIDENTIFICATIONBIC; |
OUTPUT:
[sf_service_banner_l]
Query 5
1 2 3 4 5 |
SELECT t1.*, t2.* FROM AIFMFivePrincipalMarket t1 JOIN AIFReportingInfo t2 ON t1.FK_AIFMReportingInfo = t2.PK_AIFReportingInfo; |
OUTPUT:
[flexter_button]
Which data formats apart from XML also give you the heebie jeebies and need to be liberated? Please leave a comment below or reach out to us.
Enjoyed this post? Have a look at the other posts on our blog.
Contact us for Snowflake professional services.
We created the content in partnership with Snowflake.