Loading and querying XML documents in the Oracle Database

March 4, 2024

In Oracle databases, you have two methods for handling XML data: Schema on Read and Schema on Write.

Schema on Read allows direct storage of XML in a table without the need to convert it to a relational format. XML is converted on the fly by using Oracle’s support of SQL/XML functions. It’s straightforward and efficient for infrequent ad hoc querying of XML data in Oracle.

Schema on Write, in contrast, involves converting XML into a relational format and writing it to database tables. This approach also makes use of the SQL/XML functions but writes the output to relational tables. This approach makes the converted XML data queryable with standard SQL. It offers a more integrated SQL experience and performs much better. However, it requires initial data transformation and separate storage of the data in relational format.

Both the schema-on-write and schema-on-read approaches require initially loading your XML data into the Oracle database before conversion can take place. However, you can bypass this step of loading into Oracle and completely automate the schema-on-write process by using XML conversion software.

If you are seeking efficiency, Flexter’s Automated XML Conversion eliminates manual coding and simplifies complex XML data handling. Explore Flexter’s capabilities and advantages through its Free Online service for a hassle-free, immediate trial.

The Basics of Loading XML Documents in Oracle

Let’s first look at the XMLType data type that is typically used to store XML documents inside an Oracle table.

XMLType

Introduced in Oracle Database 9i, XMLType is a built-in datatype specifically designed to handle XML data within the Oracle database. It supports both structured and unstructured XML data and consists of a Large Object (LOB) component to store the original XML data. XMLType offers a range of XML-related functions and operators for querying and manipulating XML data stored in the database. XMLType data can be stored as a CLOB (Character Large Object), treating XML data as text, or as binary XML. Storing XML data in binary format reduces storage space and enhances performance compared to storing it as plain text.

Binary XML storage

Also known as post-parse persistence, is a compact and flexible XML schema-aware binary format specifically designed for XML data storage. It serves as the default XMLType storage model if no storage model is specified during the creation of an XMLType table or column. Binary XML is well-suited for scenarios where:

  • The primary XML operations involve inserting or selecting the entire document, with minimal filtered or conditional selections.
  • The XML data is very sparse i.e. has many missing/empty elements.
  • The XML data does have an associated XML schema.
  • The XML data does have an associated XML schema but it undergoes frequent evolution.
  • The XML data is associated with multiple XML schemas.

Object-relational storage

This storage method, also known as structured storage and object-based persistence, operates on the principle of entity-relationship (ER) decomposition of XML data. When an XML schema is registered with Oracle XML DB, SQL objects are automatically generated to store the conforming XML data. This approach ensures that query performance is aligned with relational data standards, while enabling seamless in-place updates. The following use cases are a good fit for Object-relational:

  • The primary XML operations involve inserting or selecting parts of the document. It’s common to filter or conditionally select portions of XML.
  • The XML data is not sparse, meaning it does not contain many missing/empty elements.
  • The XML data is associated with an XML schema that rigorously defines the data format.
  • The associated XML schema doesn’t undergo frequent changes.
  • DOM fidelity, rather than document fidelity, is sufficient. This entails preserving all information from the original XML document upon retrieval, without the necessity of whitespace preservation.

The Oracle documentation provides a good overview on when to use each storage type.

Loading XML with INSERT INTO

This method involves inserting XML data row by row into Oracle database tables, offering flexibility but potentially slower performance for large datasets. It allows for direct control over data manipulation and transformation during insertion, facilitating tailored handling of XML content within the database.

Step 1. Create a table with XML Datatype to store XML text.

Step 2. Insert XML text into the ‘xml_tab’ table

Loading XML with SQL*Loader

SQL*Loader is a utility provided by Oracle that enables high-speed data loading from external files into Oracle database tables. It allows for efficient loading of large volumes of data in various formats, including XML, by specifying a control file that defines the data format and mapping to database tables.

Step 1. Create a table with Clob Datatype to store XML text.

Step 2. Prepare a control file (.ctl) for the SQL*Loader

Step 3. Run below SQL*Loader command to load sample.xml file into the xml_tab1 table.

Loading XML with External Tables

External tables in Oracle offer a convenient way to access data stored in external files, treating them as if they were regular tables within the database. While DML operations aren’t supported on external tables, they work well for querying, joining, and sorting operations. XML files can integrate into a table as a CLOB column (XMLType data type is not supported for external tables). Once the external table has been created we can query the data, e.g. by creating Views.

This integration facilitates blending of XML data with relational data in Oracle databases. External tables are useful in data warehouse ETL processes, eliminating staging requirements and boosting performance through parallel querying.

Let’s guide you through the process step by step

Step 1: Create XXAA_XML directory in bin location and modify the permissions.

Step 2: Upload the XML files into the XXAA_XML  directory

Step 3: Run below command to create a data file (.dat) in the bin folder

Step 4. Create a directory for the bin folder

Step 5. Create External table with above directory ‘XXAA_XT_DIR’ and SQL Loader file ‘XXAA_XML_XT_DATA.dat’

XML conversion option 1: Schema on read

Now that we have loaded some XML files into our Oracle database we can start querying the data. Oracle offers two basic options

  • SQL/XML is most commonly used and should meet the vast majority of your requirements
  • XQuery is for niche scenarios with complex requirements. It is very powerful but has a steep learning curve.

SQL/XML

Oracle’s SQL/XML functions provide tools for handling XML data within the database environment. These functions, including XMLQuery, XMLTable, XMLExists, and XMLCast, enable developers to construct XML data using relational data, query relational data as if it were XML, and construct relational data from XML data. With Oracle’s robust support for SQL/XML functions, developers can effectively manage XML data alongside traditional relational data operations.Below, you’ll find some of the useful SQL/XML functions: –

XMLQuery : This function extracts data from XML documents using XPath expressions within SQL queries.

XMLTable : This function enables you to extract data from XML documents and present it in a relational format.

XMLExists : This function checks whether an XML document contains nodes that match a specified XPath expression. It returns TRUE if at least one node exists, otherwise FALSE.

XMLAgg : This function aggregates multiple elements into a single XML element. It is commonly used with a subquery to generate XML output.

XMLCast : This function converts an XML value into another data type, such as VARCHAR2, NUMBER, DATE, etc. It’s useful for converting XML data into a more conventional data type for further processing or querying.

 

XQuery

XQuery is a functional query language for finding and extracting elements and attributes from XML documents. XQuery is similar to SQL in many ways, but just as SQL is designed for querying structured, relational data, XQuery is designed especially for querying semi-structured, XML data from a variety of data sources.

When to use XQuery and when to use SQL/XML

XQuery and SQL/XML functions are essential tools for handling XML data. XQuery is perfect for intricate XML manipulation, providing precise control over elements and relationships, making it suitable for complex hierarchical structures. On the other hand, SQL/XML functions are optimal for integrating XML tasks into relational databases, particularly for simpler processing alongside traditional SQL queries. While XQuery shines in fine-grained XML tasks, SQL/XML functions excel in maintaining efficiency and compatibility within existing database systems. In summary, the choice between XQuery and SQL/XML depends on the complexity of XML tasks and their integration into relational database workflows.

XML Index

Oracle’s XMLIndex feature is designed to improve the performance of queries on XML documents that are stored in an XML column, making it beneficial when dealing with XML data. Unlike traditional relational indexes, which rely on specified table columns for index keys, XMLIndex employs a specific XML pattern expression to index paths and values within XML documents contained in a single column. By allowing users to create indexes on XMLType columns, XMLIndex enables quicker retrieval and processing of XML data through optimised query execution paths. However, XML indexes may not be necessary for small XML documents or situations where query performance is not a critical concern.

Using an XML Index is particularly useful for the schema on read scenario as it can speed up your queries.

Example-

Getting hands on with SQL/XML and XMLTABLE

In this hands-on example we use SQL/XML functions to parse and extract data from the XML on the fly. We then create database Views for reuse of the parse logic.

Since the sample XML file contains multiple branches, we’ll have to create multiple views to get each respective branch data.

View for ‘University/Faculty/Department’ branch

Query the view :

Output-


View for ‘University/Faculty/ResearchGroup’ branch

Query the view :

Output-

View for ‘University/AdministrativeUnit’ branch

Query the view –

Important note

The CREATE VIEW syntax shown above is applicable for ‘INSERT INTO’ and ‘SQL*Loader’ methods as they have a column of datatype XMLType that contains the XML text. However, External tables do not inherently support the XMLTYPE datatype. Therefore, when working with XML data in external tables, you would typically parse and extract the relevant information from the XML structure using SQL functions and then store it in appropriate data types such as VARCHAR2 or CLOB. Consequently, we need to typecast the CLOB column to XMLType when creating views from them.

Example:-

XML conversion option 2: Schema on Write

As an alternative to schema on read we can convert the XML to Oracle tables. It will make it much easier for users to query the tables in SQL.

We first create the DDL for the target tables and then use the Views we created in the previous section to load the data to these tables.

They are then ready to be queried using SQL.

Target table for ‘University/Faculty/Department’ branch

A screenshot of a computer program

Description automatically generated

Target table for ‘University/Faculty/ResearchGroup’ branch

Target table for ‘University/AdministrativeUnit’ branch

 Automatically loading and converting XML to Oracle

The latest XML conversion tools automate the entire process of converting XML to Oracle, akin to XML-as-schema-on-write, but without any manual intervention. These tools eliminate the need to load the XML data to Oracle first. They automate schema creation, mapping, and error handling, offering a fully automated solution compared to traditional methods using Oracle’s SQL/XML extensions.

Here is a summary comparison of Schema On Read vs Schema on Write vs Automated XML Conversion to Oracle.

You can download the comparison as a PDF

Manual vs. Automated Conversion with Flexter:

In contrasting manual XML conversion techniques with automated processes facilitated by tools like Flexter, the advantages of automation become clear:

  • Complexity Management: Automated processes adeptly handle any level of XML and XSD complexity efficiently.
  • Accelerated Deployment: By automating the XML conversion lifecycle—including analysis, schema generation, and mappings—data is swiftly prepared for decision-making.
  • Risk Reduction: The automation of XML parsing mitigates the risk of project delays or budget overruns, especially in complex projects adhering to industry standards.
  • Consistency and Accuracy: Automated parsing ensures consistent and precise execution, significantly lowering the potential for human error.
  • Scalability: XML automation tools are designed to meet any service level agreement (SLA) and handle large volumes of XML data.
  • User Accessibility: The user-friendly interfaces of automated tools reduce the dependency on specialized XML skills, which are scarce and represent a vulnerability for projects.

Considering Automated Conversion:

While automated conversion processes present clear benefits, they must be balanced against the costs of licensing tools like Flexter. For simple requirements or occasional queries, an XML automation tool may not be necessary. However, conversion software becomes invaluable under certain conditions, such as dealing with complex XML/XSD structures, processing large or numerous types of XML files, working under strict SLAs, or operating without in-house XML expertise.

Flexter – Your Solution for Complex XML Conversion:

If you’re facing challenges with complex XML structures, large file sizes, or tight project deadlines, consider exploring Flexter’s capabilities. Our free online version offers a glimpse into the efficiency and effectiveness of XML conversion software. For more complex inquiries, our experts are ready to discuss your XML conversion needs in detail, ensuring your XML to Oracle conversion project is both successful and efficient.