Loading and querying XML documents in the Oracle Database
In Oracle databases, you have two methods for handling XML data: Schema on Read and Schema on Write.
Table of Contents
Schema on Read allows direct storage of XML in a table without the need to convert it to a relational format. XML is converted on the fly by using Oracle’s support of SQL/XML functions. It’s straightforward and efficient for infrequent ad hoc querying of XML data in Oracle.
Schema on Write, in contrast, involves converting XML into a relational format and writing it to database tables. This approach also makes use of the SQL/XML functions but writes the output to relational tables. This approach makes the converted XML data queryable with standard SQL. It offers a more integrated SQL experience and performs much better. However, it requires initial data transformation and separate storage of the data in relational format.
Both the schema-on-write and schema-on-read approaches require initially loading your XML data into the Oracle database before conversion can take place. However, you can bypass this step of loading into Oracle and completely automate the schema-on-write process by using XML conversion software.
If you are seeking efficiency, Flexter’s Automated XML Conversion eliminates manual coding and simplifies complex XML data handling. Explore Flexter’s capabilities and advantages through its Free Online service for a hassle-free, immediate trial.
The Basics of Loading XML Documents in Oracle
Let’s first look at the XMLType data type that is typically used to store XML documents inside an Oracle table.
XMLType
Introduced in Oracle Database 9i, XMLType is a built-in datatype specifically designed to handle XML data within the Oracle database. It supports both structured and unstructured XML data and consists of a Large Object (LOB) component to store the original XML data. XMLType offers a range of XML-related functions and operators for querying and manipulating XML data stored in the database. XMLType data can be stored as a CLOB (Character Large Object), treating XML data as text, or as binary XML. Storing XML data in binary format reduces storage space and enhances performance compared to storing it as plain text.
Binary XML storage
Also known as post-parse persistence, is a compact and flexible XML schema-aware binary format specifically designed for XML data storage. It serves as the default XMLType storage model if no storage model is specified during the creation of an XMLType table or column. Binary XML is well-suited for scenarios where:
- The primary XML operations involve inserting or selecting the entire document, with minimal filtered or conditional selections.
- The XML data is very sparse i.e. has many missing/empty elements.
- The XML data does have an associated XML schema.
- The XML data does have an associated XML schema but it undergoes frequent evolution.
- The XML data is associated with multiple XML schemas.
Object-relational storage
This storage method, also known as structured storage and object-based persistence, operates on the principle of entity-relationship (ER) decomposition of XML data. When an XML schema is registered with Oracle XML DB, SQL objects are automatically generated to store the conforming XML data. This approach ensures that query performance is aligned with relational data standards, while enabling seamless in-place updates. The following use cases are a good fit for Object-relational:
- The primary XML operations involve inserting or selecting parts of the document. It’s common to filter or conditionally select portions of XML.
- The XML data is not sparse, meaning it does not contain many missing/empty elements.
- The XML data is associated with an XML schema that rigorously defines the data format.
- The associated XML schema doesn’t undergo frequent changes.
- DOM fidelity, rather than document fidelity, is sufficient. This entails preserving all information from the original XML document upon retrieval, without the necessity of whitespace preservation.
The Oracle documentation provides a good overview on when to use each storage type.
Loading XML with INSERT INTO
This method involves inserting XML data row by row into Oracle database tables, offering flexibility but potentially slower performance for large datasets. It allows for direct control over data manipulation and transformation during insertion, facilitating tailored handling of XML content within the database.
Step 1. Create a table with XML Datatype to store XML text.
1 2 3 4 |
CREATE TABLE xml_tab ( id NUMBER, xml_data XMLTYPE ); |
Step 2. Insert XML text into the ‘xml_tab’ table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
INSERT INTO xml_tab (id, xml_data) VALUES (1, XMLType('<University id="TUD"> <Faculty id="Engineering"> <Department id="ComputerScience"> <Course>Introduction to Programming</Course> <Course>Algorithms and Data Structures</Course> </Department> <Department id="ElectricalEngineering"> <Course>Circuit Analysis</Course> <Course>Electromagnetics</Course> <Course>Control Systems</Course> </Department> <ResearchGroup id="ArtificialIntelligence"> <Project>Machine Learning Advances</Project> <Project>Neural Network Optimization</Project> <Project>AI Ethics and Society</Project> </ResearchGroup> <ResearchGroup id="Nanotechnology"> <Project>Nano-materials Engineering</Project> <Project>Quantum Computing</Project> </ResearchGroup> </Faculty> <AdministrativeUnit id="Library"> <Service>Book Lending</Service> <Service>Research Assistance</Service> </AdministrativeUnit> <AdministrativeUnit id="IT"> <Service>Campus Network</Service> <Service>Helpdesk</Service> </AdministrativeUnit> </University> ')); |
1 |
SELECT * FROM xml_tab |
Loading XML with SQL*Loader
SQL*Loader is a utility provided by Oracle that enables high-speed data loading from external files into Oracle database tables. It allows for efficient loading of large volumes of data in various formats, including XML, by specifying a control file that defines the data format and mapping to database tables.
Step 1. Create a table with Clob Datatype to store XML text.
1 2 3 4 |
CREATE TABLE xml_tab( FILENAME VARCHAR2(255), xml_data XMLTYPE ) |
Step 2. Prepare a control file (.ctl) for the SQL*Loader
1 2 3 4 5 6 7 8 9 10 11 |
LOAD DATA INFILE * REPLACE INTO TABLE xml_tab FIELDS TERMINATED BY ',' TRAILING NULLCOLS ( FILENAME, xml_data LOBFILE(FILENAME) TERMINATED BY EOF ) BEGINDATA /home/oracle/Downloads/DUMPS/sample.xml |
Step 3. Run below SQL*Loader command to load sample.xml file into the xml_tab1 table.
1 2 3 4 5 |
$ sqlldr system/oracle@orcl control=control_file.ctl log=loader_log.log SQL*Loader: Release 11.2.0.4.0 - Production on Fri Feb 2 10:06:46 2024 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. Commit point reached - logical record count 1 |
1 |
SELECT * FROM xml_tab |
Loading XML with External Tables
External tables in Oracle offer a convenient way to access data stored in external files, treating them as if they were regular tables within the database. While DML operations aren’t supported on external tables, they work well for querying, joining, and sorting operations. XML files can integrate into a table as a CLOB column (XMLType data type is not supported for external tables). Once the external table has been created we can query the data, e.g. by creating Views.
This integration facilitates blending of XML data with relational data in Oracle databases. External tables are useful in data warehouse ETL processes, eliminating staging requirements and boosting performance through parallel querying.
Let’s guide you through the process step by step
Step 1: Create XXAA_XML directory in bin location and modify the permissions.
1 2 3 |
$ mkdir XXAA_XML $ chmod 777 XXAA_XML $ cd 777 XXAA_XML |
Step 2: Upload the XML files into the XXAA_XML directory
1 2 3 |
XXAA_XML]$ ls -lrt total 8 -rw-rw-rw-. 1 oracle oracle 1208 Feb 1 07:57 sample.xml |
Step 3: Run below command to create a data file (.dat) in the bin folder
1 2 3 4 5 |
$ ls -l *.xml | awk '{print "\/XXAA_XML\/"$9}' > /u01/Middleware/ODI12c/oracle_common/bin/XXAA_XML_XT_DATA.dat $ cat XXAA_XML_XT_DATA.dat /XXAA_XML/sample.xml |
Step 4. Create a directory for the bin folder
1 2 3 4 5 6 |
CREATE OR REPLACE DIRECTORY XXAA_XT_DIR AS '/u01/Middleware/ODI12c/oracle_common/bin'; SELECT * FROM ALL_DIRECTORIES WHERE 1=1 AND directory_name = 'XXAA_XT_DIR' |
Step 5. Create External table with above directory ‘XXAA_XT_DIR’ and SQL Loader file ‘XXAA_XML_XT_DATA.dat’
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE xml_tab( FILENAME VARCHAR2(100), XML_DATA CLOB ) ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER DEFAULT DIRECTORY XXAA_XT_DIR ACCESS PARAMETERS ( FIELDS (FILENAME CHAR(100)) COLUMN TRANSFORMS (XML_DATA FROM lobfile (FILENAME))) LOCATION ('XXAA_XML_XT_DATA.dat')) |
1 |
SELECT * FROM xml_tab; |
XML conversion option 1: Schema on read
Now that we have loaded some XML files into our Oracle database we can start querying the data. Oracle offers two basic options
- SQL/XML is most commonly used and should meet the vast majority of your requirements
- XQuery is for niche scenarios with complex requirements. It is very powerful but has a steep learning curve.
SQL/XML
Oracle’s SQL/XML functions provide tools for handling XML data within the database environment. These functions, including XMLQuery, XMLTable, XMLExists, and XMLCast, enable developers to construct XML data using relational data, query relational data as if it were XML, and construct relational data from XML data. With Oracle’s robust support for SQL/XML functions, developers can effectively manage XML data alongside traditional relational data operations.Below, you’ll find some of the useful SQL/XML functions: –
XMLQuery : This function extracts data from XML documents using XPath expressions within SQL queries.
XMLTable : This function enables you to extract data from XML documents and present it in a relational format.
XMLExists : This function checks whether an XML document contains nodes that match a specified XPath expression. It returns TRUE if at least one node exists, otherwise FALSE.
XMLAgg : This function aggregates multiple elements into a single XML element. It is commonly used with a subquery to generate XML output.
XMLCast : This function converts an XML value into another data type, such as VARCHAR2, NUMBER, DATE, etc. It’s useful for converting XML data into a more conventional data type for further processing or querying.
XQuery
XQuery is a functional query language for finding and extracting elements and attributes from XML documents. XQuery is similar to SQL in many ways, but just as SQL is designed for querying structured, relational data, XQuery is designed especially for querying semi-structured, XML data from a variety of data sources.
When to use XQuery and when to use SQL/XML
XQuery and SQL/XML functions are essential tools for handling XML data. XQuery is perfect for intricate XML manipulation, providing precise control over elements and relationships, making it suitable for complex hierarchical structures. On the other hand, SQL/XML functions are optimal for integrating XML tasks into relational databases, particularly for simpler processing alongside traditional SQL queries. While XQuery shines in fine-grained XML tasks, SQL/XML functions excel in maintaining efficiency and compatibility within existing database systems. In summary, the choice between XQuery and SQL/XML depends on the complexity of XML tasks and their integration into relational database workflows.
XML Index
Oracle’s XMLIndex feature is designed to improve the performance of queries on XML documents that are stored in an XML column, making it beneficial when dealing with XML data. Unlike traditional relational indexes, which rely on specified table columns for index keys, XMLIndex employs a specific XML pattern expression to index paths and values within XML documents contained in a single column. By allowing users to create indexes on XMLType columns, XMLIndex enables quicker retrieval and processing of XML data through optimised query execution paths. However, XML indexes may not be necessary for small XML documents or situations where query performance is not a critical concern.
Using an XML Index is particularly useful for the schema on read scenario as it can speed up your queries.
Example-
1 2 |
CREATE INDEX xml_data_index ON xml_tab(xml_data) INDEXTYPE IS XDB.XMLIndex; |
Getting hands on with SQL/XML and XMLTABLE
In this hands-on example we use SQL/XML functions to parse and extract data from the XML on the fly. We then create database Views for reuse of the parse logic.
Since the sample XML file contains multiple branches, we’ll have to create multiple views to get each respective branch data.
View for ‘University/Faculty/Department’ branch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE OR REPLACE VIEW dept_courses AS SELECT uni."id" as university_id, faculty."id" as faculty_id, depts."id" as department_id, courses."Course" as course_name FROM xml_tab, XMLTABLE( '/University' PASSING XML_DATA COLUMNS "id" VARCHAR(50) PATH '@id', faculty XMLTYPE PATH 'Faculty' ) uni, XMLTABLE( '/Faculty' PASSING uni.faculty COLUMNS "id" VARCHAR(50) PATH '@id', depts XMLTYPE PATH 'Department' ) faculty, XMLTABLE( '/Department' PASSING faculty.depts COLUMNS "id" VARCHAR(50) PATH '@id', courses XMLTYPE PATH 'Course' ) depts, XMLTABLE( '/Course' PASSING depts.courses COLUMNS "Course" VARCHAR(50) PATH '.' ) courses; |
Query the view :
1 |
SELECT * FROM dept_course |
Output-
View for ‘University/Faculty/ResearchGroup’ branch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
CREATE OR REPLACE VIEW researchgrp_projects AS SELECT uni."id" as university_id, faculty."id" as faculty_id, grp."id" as research_group_id, projects."Project" as Project_name FROM xml_tab, XMLTABLE( '/University' PASSING XML_DATA COLUMNS "id" VARCHAR(50) PATH '@id', faculty XMLTYPE PATH 'Faculty' ) uni, XMLTABLE( '/Faculty' PASSING uni.faculty COLUMNS "id" VARCHAR(50) PATH '@id', grp XMLTYPE PATH 'ResearchGroup' ) faculty, XMLTABLE( '/ResearchGroup' PASSING faculty.grp COLUMNS "id" VARCHAR(50) PATH '@id', projects XMLTYPE PATH 'Project' ) grp, XMLTABLE( '/Project' PASSING grp.projects COLUMNS "Project" VARCHAR(50) PATH '.' ) projects; |
Query the view :
1 |
SELECT * FROM researchgrp_projects; |
Output-
View for ‘University/AdministrativeUnit’ branch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CREATE OR REPLACE VIEW admin_service AS SELECT uni."id" as university_id, adminunit."id" as administrative_unit_id, services."Service" as service FROM xml_tab, XMLTABLE( '/University' PASSING XML_DATA COLUMNS "id" VARCHAR(50) PATH '@id', adminunit XMLTYPE PATH 'AdministrativeUnit' ) uni, XMLTABLE( '/AdministrativeUnit' PASSING uni.adminunit COLUMNS "id" VARCHAR(50) PATH '@id', services XMLTYPE PATH 'Service' ) adminunit, XMLTABLE( '/Service' PASSING adminunit.services COLUMNS "Service" VARCHAR(50) PATH '.' ) services; |
Query the view –
1 |
SELECT * FROM admin_service; |
Important note
The CREATE VIEW syntax shown above is applicable for ‘INSERT INTO’ and ‘SQL*Loader’ methods as they have a column of datatype XMLType that contains the XML text. However, External tables do not inherently support the XMLTYPE datatype. Therefore, when working with XML data in external tables, you would typically parse and extract the relevant information from the XML structure using SQL functions and then store it in appropriate data types such as VARCHAR2 or CLOB. Consequently, we need to typecast the CLOB column to XMLType when creating views from them.
Example:-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
CREATE OR REPLACE VIEW admin_service AS SELECT uni."id" as university_id, adminunit."id" as administrative_unit_id, services."Service" as service FROM xml_tab, XMLTABLE( '/University' PASSING XMLTYPE(XML_DATA) COLUMNS "id" VARCHAR(50) PATH '@id', adminunit XMLTYPE PATH 'AdministrativeUnit' ) uni, XMLTABLE( '/AdministrativeUnit' PASSING uni.adminunit COLUMNS "id" VARCHAR(50) PATH '@id', services XMLTYPE PATH 'Service' ) adminunit, XMLTABLE( '/Service' PASSING adminunit.services COLUMNS "Service" VARCHAR(50) PATH '.' ) services; |
XML conversion option 2: Schema on Write
As an alternative to schema on read we can convert the XML to Oracle tables. It will make it much easier for users to query the tables in SQL.
We first create the DDL for the target tables and then use the Views we created in the previous section to load the data to these tables.
They are then ready to be queried using SQL.
Target table for ‘University/Faculty/Department’ branch
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE UNIVERSITY.Dept_Table( university_id VARCHAR2(100), faculty_id VARCHAR2(100), department_name VARCHAR2(100), course_name VARCHAR2(100) ) INSERT INTO UNIVERSITY.Dept_Table SELECT * FROM dept_courses; SELECT * FROM UNIVERSITY.Dept_Table; |
Target table for ‘University/Faculty/ResearchGroup’ branch
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE TABLE UNIVERSITY.Research_Groups_Table( university_id VARCHAR2(100), faculty_id VARCHAR2(100), research_group_id VARCHAR2(100), project_name VARCHAR2(100) ) INSERT INTO UNIVERSITY.Research_Groups_Table SELECT * FROM researchgrp_projects; SELECT * FROM UNIVERSITY.Research_Groups_Table; |
Target table for ‘University/AdministrativeUnit’ branch
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE TABLE UNIVERSITY.Admin_Unit_Table( university_id VARCHAR2(100), Administrative_unit_id VARCHAR2(100), service VARCHAR2(100) ); INSERT INTO UNIVERSITY.Admin_Unit_Table SELECT * FROM admin_service; SELECT * FROM UNIVERSITY.Admin_Unit_Table; |
Automatically loading and converting XML to Oracle
The latest XML conversion tools automate the entire process of converting XML to Oracle, akin to XML-as-schema-on-write, but without any manual intervention. These tools eliminate the need to load the XML data to Oracle first. They automate schema creation, mapping, and error handling, offering a fully automated solution compared to traditional methods using Oracle’s SQL/XML extensions.
Here is a summary comparison of Schema On Read vs Schema on Write vs Automated XML Conversion to Oracle.
You can download the comparison as a PDF
Manual vs. Automated Conversion with Flexter:
In contrasting manual XML conversion techniques with automated processes facilitated by tools like Flexter, the advantages of automation become clear:
- Complexity Management: Automated processes adeptly handle any level of XML and XSD complexity efficiently.
- Accelerated Deployment: By automating the XML conversion lifecycle—including analysis, schema generation, and mappings—data is swiftly prepared for decision-making.
- Risk Reduction: The automation of XML parsing mitigates the risk of project delays or budget overruns, especially in complex projects adhering to industry standards.
- Consistency and Accuracy: Automated parsing ensures consistent and precise execution, significantly lowering the potential for human error.
- Scalability: XML automation tools are designed to meet any service level agreement (SLA) and handle large volumes of XML data.
- User Accessibility: The user-friendly interfaces of automated tools reduce the dependency on specialized XML skills, which are scarce and represent a vulnerability for projects.
Considering Automated Conversion:
While automated conversion processes present clear benefits, they must be balanced against the costs of licensing tools like Flexter. For simple requirements or occasional queries, an XML automation tool may not be necessary. However, conversion software becomes invaluable under certain conditions, such as dealing with complex XML/XSD structures, processing large or numerous types of XML files, working under strict SLAs, or operating without in-house XML expertise.
Flexter – Your Solution for Complex XML Conversion:
If you’re facing challenges with complex XML structures, large file sizes, or tight project deadlines, consider exploring Flexter’s capabilities. Our free online version offers a glimpse into the efficiency and effectiveness of XML conversion software. For more complex inquiries, our experts are ready to discuss your XML conversion needs in detail, ensuring your XML to Oracle conversion project is both successful and efficient.