Oracle Data Integrator (ODI),

Home » Oracle Data Integrator (ODI) » ODI JDBC variable binding: 500% performance gains, Array Fetch Size, Batch Update Size, Row Prefetching, and the Array Interface; and an issue when running the agent in Weblogic.

ODI JDBC variable binding: 500% performance gains, Array Fetch Size, Batch Update Size, Row Prefetching, and the Array Interface; and an issue when running the agent in Weblogic.

by Uli Bethke

Uli has been rocking the data world since 2001. As the Co-founder of Sonra, the data liberation company, he’s on a mission to set data free. Uli doesn’t just talk the talk—he writes the books, leads the communities, and takes the stage as a conference speaker.

Any questions or comments for Uli? Connect with him on LinkedIn.

Published on March 22, 2011
Updated on January 1, 2025

Get our e-books Discover the Oracle Data Integrator 11g Repository Data Model and Oracle Data Integrator Snippets and Recipes

This is a bit of mouthful of a title. But there is a lot to cover here.

Have you ever wondered what the Array Fetch Size and Batch Update Size in the ODI Topology module are all about?

Your subscription could not be saved. Please try again.

You're In! Welcome to FastForward Congratulations on successfully subscribing to the FastForward Data Engineering Newsletter! You're now part of a growing community of 15,000+ data engineers who are staying ahead in the ever-evolving world of data.

FlowForward.

All Things Data Engineering
Straight to Your Inbox!

Row prefetching

What ODI calls Array Fetch Size is commonly known as row prefetching:

“The concept of row prefetching is straightforward. Every time an application asks the driver to retrieve a row from the database, several rows are prefetched with it and stored in client-side memory. In this way, several subsequent requests do not have to execute database calls to fetch data. They can be served from the client-side memory”. Quote from the excellent (the best?) book on Oracle performance tuning by Christian Antognini, Troubleshooting Oracle Performance.

For ODI this means that the higher you set this value the more rows will be retrieved per request to the source and then stored in the memory of the agent.

The main advantage is that you don’t have to re-read the same block multiple times from the source and as a result reduce the number of consistent gets. A database block typically holds more than one row, if you have set your Array Fetch Size to a low number, e.g. 1 and you have 5 rows on a block the database needs to read this block five times to retrieve all rows. Another advantage is that you need fewer roundtrips to the server.

Array Interface

What ODI calls Batch Update size is commonly known as the Array Interface:

“The array interface allows you to bind arrays instead of scalar values. This is very useful when a specific DML statement needs to insert or modify numerous rows. Instead of executing the DML statement separately for each row, you can bind all necessary values as arrays and execute it only once, or if the number of rows is high, you can split the execution into smaller batches.”, Christian Antognini, Troubleshooting Oracle Performance.

The advantage is that you need fewer round trips to the database and decrease network traffic.

This is applicable for any of the Knowledge Modules that use binding between source and target, e.g. LKM SQL to Oracle.

Impact on performance

Let’s see for ourselves what impact changing of these parameters has on execution times and agent memory consumption.

I will perform test runs where we set these values to 1, 30, 100, 500, 1000, and 20000. We will use the products table in the SH sample schema and pump it up to 72K records.

create table sh.products_big as
select * from sh.products a cross join
(
SELECT
level-1 n
FROM
DUAL
CONNECT BY LEVEL <= 1000
);

We will load this table into another schema on the same database using the LKM SQL to Oracle. We will use the Staging Area Different from Target to achieve this. Our interface looks as follows:

The above setup is only used for test and demonstration purposes. In a real world scenario you should use an LKM that uses database links to load from Oracle to Oracle as demonstrated in my post Load Knowledge Module Oracle to Oracle using database links.

Test case 1: Array Fetch Size and Batch Update Size at 1

This took a whopping 707 seconds.

CPU usage of the agent is between 0 and 5%. Memory usage zig zags between 100 MB and 150 MB.

Logical I/O is at a very high 224002.

Test case 2: Array Fetch Size and Batch Update Size at 30

The step to load 72K records into the C$ table took 25 seconds. The step started at 16:28:22 and ended at 16:28:47.

We have a peak in heap memory usage of the agent’s JVM at 16:28:52 with 141.7 MB. I am using a Weblogic (JAVA EE) agent and the Fusion Middleware Enterprise Manager to retrieve these values.

This roughly coincides with the end time of our step.

We also see some increase in CPU usage.

Querying v$sql shows us that the insert was executed 72000/30 = 2400 times. Logical I/O stands at 94020

Test case 3: Array Fetch Size and Batch Update Size at 100

The step started at 16:48:36 and finished at 16:48:54. A delta of 18 seconds.

The memory peaked at 16:49:01 with 137.2 MB. CPU usage increased to about 5%.

We have 720 executions with 91196 logical I/O.

Test case 4: Array Fetch Size and Batch Update Size at 500

Loading of data took 21 seconds. This is not in line with expectations. An explanation follows further down.

Heap usage of agent JVM goes up to about 160 MB. CPU usage increases to about 12%.

We have 144 executions with 90192 logical I/O.

Test case 5: Array Fetch Size and Batch Update Size at 1000

It now took 34 seconds to load data.

Heap usage went above 200 MB and CPU touched 20%.

We had 72 executions with 90017 buffer gets.

Why does a setting of array size 500 and 1000 take longer to execute than the one with 100?

I was trying to find a good answer to this question but could not really come up with any. Eventually I decided to rerun test cases 4 and 5 using a standalone agent to get a second opinion.

Test case 4 rerun standalone agent

This now just took 8 seconds compared to the 21 previously with the Weblogic agent. This is more in line with expectations.

CPU usage peaked at 13%. Similar to what we got in the Weblogic agent.

Test case 5 rerun with standalone agent

This took just 6 seconds now. A 500% performance improvement to the default settings

CPU usage peaked at 18.46%. Memory usage peaked at 124 MB.

Test case 6 Array Fetch Size and Batch Update Size at 20000

Failed initially due to insufficient agent memory.

I doubled heap allocation to 512 MB for the agent and reran.

Once again it took 6 seconds.

Memory usage jumped up to 522.8 MB. CPU usage peaked at 28.03%.

Conclusion

– You can not ignore prefetch and array size in a comprehensive ODI performance tuning strategy.
– There seems to be an issue or rather some mysterious inefficiency when running the agent in Weblogic. Both the standalone agent and the Weblogic agent were using the same JDK version. This can’t be the issue then. If I find some time I will log an SR with Oracle on this.
– We get diminishing returns for increasing the array size: There comes a point when increasing the prefetch and array size becomes counter-productive as resource usage jumps up with only slightly improved response time.
– There is no such thing as a free lunch: Increasing prefetch and array size increases CPU and memory usage on the agent.
– Increasing prefetch and array size reduces the number of logical I/O.
– Finding the correct prefetch and array size is both art and science. Finding the optimal value depends on your environment (resources available, concurrency, workload etc.), and the size of the source table. As a general rule for the source table: The smaller and narrower your source table the higher you can set the prefetch and array size.
– As the optimal array and prefetch size is partly determined by the size of the source table it would make a lot more sense to be able to set the Array Fetch Size and Batch Update Size at the interface level rather than at the data server level.

Related links

David Allan from Oracle has pointed out that Data Direct have some next generation JDBC drivers for bulk loading large data volumes. There is also an LKM in ODI 11g that makes use of them.

Data Direct Website

Some performance tests for Data Direct drivers

About the author:

Uli Bethke

Co-founder of Sonra

Any questions or comments for Uli? Connect with him on LinkedIn.

Follow Uli Bethke:

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD

SQL Visualisation Guide - Query Diagrams, Lineage & ERD