Converting DITA XML to a database (MySQL)

Maciek
by Maciek

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.


Published on October 15, 2019
Updated on November 20, 2024

In this post we will convert the DITA XML standard used in Publishing. We will convert it to a MySQL database with Sonra’s data warehouse automation tool for XML, JSON, and industry data standards Flexter.

XML Publishing

Traditional publisher workflows rely on print-first content. The print-first workflow makes editing harder, since the entire document needs to be opened for even minor changes. It also makes it harder for the content to be prepared for conversion into different digital formats.
Some advantages of going with XML-first workflow:

  • Documents that adapt easily to any publishing medium.
  • Independent sections that can be modified by different editors at the same time
  • Ability to do style changes globally with ease

All those advantages reduce the cost and time you need to invest into preparing a document for digital publication. It makes XML-first workflow and XML publishing the obvious choice.

DITA

The Darwin Information Typing Architecture or Document Information Typing Architecture (DITA) is an open XML standard for publishing.
It was developed to meet IBM’s requirements for technical documentation, but since DITA architecture is general it can be used for any kind of document (technical documentation, commercial Publishing, pharmaceutical information, standards, training materials and much more).
One big advantage of DITA is that it guarantees interoperability between different XML documents. You can simply extend or constrain the out-of-the-box DITA vocabulary as your understanding and requirements evolve.
[flexter_banner]

Processing masked XML with Flexter

Flexter exposes its functionality through a RESTful API. Converting XML/JSON to SQL Server can be done in a few simple steps.
Step 1 – Authenticate
Step 2 – Define Source Connection (Upload or S3) for Source Data (JSON/XML)
Step 3 – Optionally define Source Connection (Upload or S3) for Source Schema (XSD)
Step 4 – Define your Target Connection, e.g. Snowflake, Redshift, SQL Server, Oracle etc.
Step 5 – Convert your XML/JSON from Source to Target Connection

Step 1 – Authenticate

To get an access_token you need to make a call to /oauth/token with Authorization header and 3 form parameters:

  • username=YOUR_EMAIL
  • password=YOUR_PASSWORD
  • grant_type=password

You will get your username and password from Sonra when you sign up for the service.

Example of output

Step 2 – Define Source Connection (Upload) for Source Data (DITA XML)

In a second step we will upload our DITA XML source data

Example of output

Step 3 – Define Target Connection (MySql)

Since we don’t have a Source Schema we skip the optional step of defining a Source Schema.
We define our Target connection. We give the Target Connection a name and supply various connection parameters to the MySQL database.

Example of output

Step 4 – Convert XML data from Source Connection (Upload) to Target Connection (MySQL)

In the last step we convert DITA XML. Data will be written to MySQL Server Target Connection.

Example of output

ER Diagram

We can create and download an ER Diagram of the model that Flexter generated by making a GET call.

Example of output

We can just copy paste download link to the browser and the ER Diagram will be downloaded.

If you want to view the ER Diagram you can find it here.
Next we will run an SQL Query that will return Question and Answer columns.

Conclusion

With DITA and Flexter you can reduce cost and time that you need to invest into publishing and converting data. This post have show you how easy and fast it is to convert your data to MySQL database with Flexter.
Our enterprise edition can be installed on a single node or for very large volumes of XML on a cluster of servers.
If you have any questions please refer to the Flexter FAQ section. You can also request a demo of Flexter or reach out to us directly.

Maciek

About the author:

Maciek

Co-founder of Sonra

Maciek is the Co-founder of Sonra. He has a knack for turning messy semi-structured formats like XML, JSON, and XSD into readable data. With a brain wired for product and data architecture, Maciek is the magic ingredient to making sure your systems don’t just work—they shine.