Converting 3GPP Configuration Management XML (3G, UMTS, 5G) to Snowflake
XML in the telecom industry
XML standards have been widely adopted in the telecom industry. The standards themselves and their specifications are developed by 3GPP. The 3rd Generation Partnership Project (3GPP) unites seven telecommunications standard development organizations (ARIB, ATIS, CCSA, ETSI, TSDSI, TTA, TTC). It provides their members with a stable environment to produce the specifications that define 3GPP technologies.
Both Configuration Management and performance/event data has been specified as XML in various standards. Configuration data is typically very complex and comes in small volumes. However, the files themselves are quite large and frequently exceed 1 GB. Performance data is less complex but comes in large volumes. The schemas (XSD) for telco data are typically quite large and complex and comprise standards such as UTMS, 4G, 5G etc. Only a subset of the elements in these huge schemas is used by the individual XML files. The XML documents themselves will contain many recursive relationships and self-referencing elements, which adds more complexity.
Schemas change frequently when new versions are released and can be customised and modified by regions or individual telco vendors such as Ericsson or Huawei. For each release you will find hundreds of changes, which may require a significant amount of refactoring. Some of these XSD schemas have hundreds of complex types with thousands of XML elements.
Flexter, our data warehouse automation tool for XML, JSON, and industry data standards is a perfect fit for these requirements. Flexter loves complexity. You have large volumes of data? Even better.
3GPP Configuration Management XML
The 3GPP standards and specifications come with XML file format definitions. These are documented extensively, e.g. the specification for UMTS configuration management.
Configuration Management (CM) provides the telco operator with the ability to assure correct and effective operation of the network as it evolves. CM actions have the objective to control and monitor the actual configuration on the Network Elements (NEs) and Network Resources (NRs), and they may be initiated by the operator or by functions in the Operations Systems (OSs) or NEs.
Let’s have a look at a high level of the XSD for configuration management.
As you can see the XSD is split into three parts: 1) Header 2) Data 3) Footer
The Following XML namespaces are potentially used in Configuration data XML files:
- the default XML namespace is associated with the configuration data files base XML schema bulkCmConfigDataFile.xsd
- the XML namespace prefix xn is defined for the XML namespace associated with the NRM specific XML schema genericNrm.xsd for the Generic Network Resources IRP NRM
- the XML namespace prefix un is defined for the XML namespace associated with the NRM specific XML schema utranNrm.xsd for the UTRAN Network Resources IRP NRM
- the XML namespace prefix gn is defined for the XML namespace associate with the NRM specific XML schema geranNrm.xsd for the GERAN Network Resources IRP NRM
- XML namespaces prefixes starting with vs, e.g. vsRH011, are reserved for the XML namespaces associated with the vendor-specific XML schemas
Masking 3GPP Configuration Management XML
For the purpose of this blog post we will use a single Configuration Management XML file. It comes in at ~500 MB uncompressed. You will see later in the relational ER diagram that it is quite complex.
We will start by masking this XML file with Paranoid. To mask data we have to provide a path to our file and a path to an output location. Paranoid will create the folder automatically, no need to create it first.
Optionally Paranoid has the feature to mask individual elements inside of a document.
1 |
paranoid -i <file path> -o <output file path> |
Let’s check the file after obfuscation
After masking 3GPP Telco Standard XML file we can start with converting it.
Converting 3GPP Configuration XML
Flexter exposes its functionality through a RESTful API. Converting XML/JSON to Snowflake can be done in a few simple steps.
Step 1 – Authenticate
Step 2 – Define Source Connection (Upload or S3) for Source Data (JSON/XML)
Step 3 – Optionally define Source Connection (Upload or S3) for Source Schema (XSD)
Step 4 – Define your Target Connection, e.g. Snowflake, Redshift, SQL Server, Oracle etc.
Step 5 – Convert your XML/JSON from Source to Target Connection
Let’s go through these steps for 3GPP XML data and convert it to a relational format in Snowflake.
Step 1 – Authenticate
To get an access_token you need to make a call to /oauth/token with Authorization header and 3 form parameters:
- username=YOUR_EMAIL
- password=YOUR_PASSWORD
- grant_type=password
You will get your username and password from Sonra when you sign up for the service.
1 2 3 4 |
curl --location --request POST "https://api.sonra.io/oauth/token" \ --header "Content-Type: application/x-www-form-urlencoded" \ --header "Authorization: Basic NmdORDZ0MnRwMldmazVzSk5BWWZxdVdRZXRhdWtoYWI6ZzlROFdRYm5Ic3BWUVdCYzVtZ1ZHQ0JYWjhRS1c1dUg=" \ --data "username=XXXXXXXXX&password=XXXXXXXXX&grant_type=password" |
Example of output
1 2 3 4 5 6 7 8 |
{ "access_token": "eyJhbG........", "token_type": "bearer", "refresh_token": "..........", "expires_in": 43199, "scope": "read write", "jti": "9f75f5ad-ba38-4baf-843a-849918427954" } |
Step 2 – Define Source Connection (Upload) for Source Data (3GPP CM XML)
In a second step we upload the 3GPP Telco Standard XML source data
1 2 3 4 5 |
curl --location --request POST "https://api.sonra.io/data_sources/3gpp_standard" \ --header "Authorization: Bearer <access_token>" \ --form "source_type=uploaded_file" \ --form "file=@<file_path>" \ --form "data_type=xml" |
Example of output
1 2 3 4 5 6 7 8 |
{ "name" : "3gpp_standard", "type" : "xml", "source_type" : "uploaded_file", "path" : "file-06e2f2b9-84cb-4332-ae60-e10f99bf8ffd", "size" : 37500, "create_date" : "2019-08-07T13:41:02.423+0000" } |
Step 4 – Define Target Connection (Snowflake)
As we don’t have a Source Schema we skip the optional step of defining a Source Schema.
Instead we define our Target connection. In this example we convert our XML data to a relational format in Snowflake.
We give the Target Connection a name and supply various connection parameters to the Snowflake database.
1 2 3 4 5 6 7 8 9 |
curl --location --request POST "https://api.sonra.io/target_connections/3gpp_standard" \ --header "Authorization: Bearer <access_token>" \ --form "target_type=snowflake" \ --form "host=xxxxxxx.eu-central-1.snowflakecomputing.com" \ --form "username=sfuser" \ --form "password=sfpass" \ --form "database=sfdb" \ --form "schema=sfchema" \ --form "warehouse=sfwh" |
Example of output
1 2 3 4 5 6 |
{ "name" : "3gpp_standard", "target_type" : "snowflake", "path" : "xxxxxxx.eu-central-1.snowflakecomputing.com", "create_date" : "2019-08-02T16:05:18.079+0000" } |
Step 5 – Convert XML data from Source Connection (Upload) to Target Connection (Snowflake)
In the next step we will convert our XML data. Data will be written directly to the Snowflake Target Connection.
1 2 3 4 |
curl --location --request POST "https://api.sonra.io/conversions/3gpp_standard" \ --header "Authorization: Bearer <access_token>" \ --form "data_source=3gpp_standard" \ --form "target=3gpp_standard" |
Example of output
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
{ "name" : "DataFlow-05ee4940-c282-46d3-9838-eeb5ec9d6efe", "schema_source_connection" : null, "data_source_connection" : { "name" : "3gpp_standard", "type" : "xml", "source_type" : "uploaded_file", "path" : "file-faa9c1ab-2765-419e-953d-d18697ac01fe", "size" : 598384, "create_date" : "2019-08-01T15:29:47.279+0000" }, "target_connection" : { "name" : "3gpp_standard", "target_type" : "snowflake", "path" : "xxxxxxx.eu-central-1.snowflakecomputing.com", "create_date" : "2019-08-01T15:57:14.887+0000" }, "create_date" : "2019-08-02T12:31:34.392+0000", "status" : "I", "download_link" : null, "full_status" : "INITIALIZED" } |
Example of ER Diagram
There are many hierarchical levels in the relational output. All the little dots you see are tables in the ER-diagram. We can also see hundreds or even thousands of different data points. You can download the ER diagram from this location
Let’s run some SQL queries against the output.
1 2 3 4 |
SELECT * FROM "GPP_STANDARD"."PUBLIC"."UTRANCELL" utran JOIN "GPP_STANDARD"."PUBLIC"."GSMRELATION" gsm ON utran."PK_UTRANCELL" = gsm."FK_UTRANCELL" |
The query shows the obfuscated values for our Configuration Management data set.
Conclusion
We have shown you how easy it is to obfuscate and convert 3GPP Telco Standard XML with Paranoid and Flexter. If you are interested in Flexter you can book a demo.
Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.
If you have any questions please refer to the Flexter FAQ section. You can also request a demo of Flexter or reach out to us directly.
Enjoyed this post? Have a look at the other posts on our blog.
Contact us for Snowflake professional services.
We created the content in partnership with Snowflake.
In this video, we use Flexter to automatically convert very complex FpML XML to Snowflake tables. Book a demo to see the power of Flexter in action!