CDC & Streams Performance Test

November 11, 2008

Scenarios

Objective of the Streams Performance Test was to identify risks and bottlenecks for CDC and Streams. The following items were of particular interest to us:

Identify bottlenecks for batch jobs with a low commit frequency, i.e. transactions that modify large numbers of records without committing frequently. We needed to know this because we have low commit frequency batch jobs running on the transactional source system during the day. Because as per Oracle Metalink 335516.1 it is recommended to “decrease transaction sizes to less than 1000 LCRs”. We wanted to know how long it would take for batch transactions of 10K, 100K and 1000K records to be applied to the change tables and where bottlenecks would materialise. We also wanted to know the influence of making changes to the streams_pool_size, the apply server parallelism, and the TXN_THRESHOLD apply parameters.
1. We started off with a batch insert (one transaction and one commit) of 9500 records into a source table (random data in a copy of dba_objects). Parameters were Apply Parallelism 4, TXN_THRESHOLD 10000, Streams Pool = 0 (Automatic). It took 66 secs from end to end to load the records into the change table. We had no buffered queue spills during the capture process and we did not have any spills as part of the dequeue.
2. In the second scenario we batch inserted 99000 records into our source table. Parameters were Apply Parallelism 2, TXN_THRESHOLD 100000, Streams Pool = 0 (Automatic). This took 300s to complete from end to end. The apply reader finished his job after 165s. We had 0 records spilled to disk by the apply reader, but a massive 75K spilled messages from the buffer queue (step between capture enqueue and start of apply reader). So what must have happened was that while we did not allow the apply reader spilling messages to disk (by setting TXN_THRESHOLD to 100000) the capture process spilled the messages from the buffer to disk as the apply reader was too slow to dequeue and process the messages. So it seems that the bottleneck here was the amount of memory allocated to the streams pool. Because of low streams pool memory the apply reader had to read messages from the spill table rather than from disk.
3. As a result we changed the size of the streams pool from automatic to 384M, leaving the rest of the setup as per scenario two. This reduced the completion time to 194s. We still had 64K spilled messages from the buffer queue. A reduction of 11K.
4. In scenario four we increased the streams_pool_size to 480M. This again reduced the completion time by another 16s to a total of 178s. This shows that increasing the streams_pool_size increased performance and fewer messages were spilled to disk. As we did not have more memory available to increase the size of the streams pool we had to stop our test here. A quick look at the v$streams_pool_advice view revealed, however, that even an allocation of 960M memory to the streams pool would have resulted in message spill to disk.
5. In scenario five then we reset the streams_pool_size to 0 (automatic allocation), changed apply server parallelism to 4, and changed the TXN_THRESHOLD back to 10000, thus allowing the apply reader to spill messages to disk. Again we batch inserted 99K records into our source table. This time the records were applied to our change tables in 49s. A huge performance boost. No messages were spilled by the capture process and as per TXN_THRESHOLD setting all messages from the apply reader were spilled. As a result the apply server had to process all messages from disk.
6. Encouraged by this result we batch inserted 800K into our source table, leaving our parameters as per scenario five. However, this took 660s in total to complete. Again all messages were spilled to disk by the apply reader. As a result the capture process and apply reader were finished with their stuff after about 120s. It then took the apply server over 8 minutes to apply the transaction to the change table. As we were using a low commit frequency in our insert at source, parallelism could not be used by the apply server. So one single apply server had to deal with the load.
7. In our final scenario we left all parameters as per scenario six. However, we did a high commit frequency insert of 1M records into our source table, committing after each insert. It took 190s to complete. No messages were spilled to disk by the apply reader, few messages were spilled to disk by the capture process, and the apply server used all 4 parallel apply servers.
Will a table that is not part of a change set and that is heavily modified change capture time.
1. We batch inserted 100K records into our source table. It took 10s for the capture to complete. So no impact of this insert on capture process.
2. We inserted 100K records into our source table with a high commit frequency (one commit per transaction). Again this took 10s to complete.
3. We did not insert any records into our source table and switched logs. And again this took just 10s to complete.

Conclusion

DML activity on a source table that is not part of a change set does not negatively affect performance.
Performance for a batch job with a commit frequency of 10K was very acceptable.
Even for 99K records for a batch job with a commit frequency of 99K performance was still acceptable.
Performance for a batch job with a commit frequency of 800K was not acceptable.
Bottlenecks were spilling of either capture process or apply reader process to disk because of low memory.
As per v$streams_pool_advice, 980M of memory for the streams pool would not have been enough to avoid spilling for a batch insert of 99K records. So a fairly big allocation of memory would be needed to avoid spilling for transactions in batch mode of that size.
Only one apply server can work on one transaction. So if you have a large transaction, e.g. batch job with low commit frequency the apply server can’t divide and conquer.

Open Questions

It would have been interesting to see performance of a 100K batch insert at source with an allocation of 2G streams pool. Anyone done any similar tests?
I don’t have an answer (or a theory) of why a spilling to disk by the capture process degrades performance by about a factor of four in comparison to the spilling to disk by the apply reader. Maybe the LCR takes up more space than a transaction? Anyone any ideas?

Tests performed on server with 1.5G PGA, 1.5G SGA and 8 CPUs. Both source and downstream target server on Oracle 10.2.0.3.

Relevant Documents

Chapter 16 Data Warehousing Guide
Streams concept and administration
Streams Performance White Paper
Note 273674.1 – Streams Configuration Report and Health Check Script
Note 335516.1 – Streams Performance Recommendations
Note 418755.1 – 10.2.0.x.x Streams Recommendations
Note 437838.1 – Recommended Patches for Streams
Note 290605.1 Oracle Streams STRMMON Monitoring Utility
Note 238455.1 Streams Supported and Unsupported Datatypes
Note 313748.1 Using Automatic Statistics Collection In A Streams

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

CDC & Streams Performance Test

Scenarios

Conclusion

Open Questions

Relevant Documents

CDC & Streams Performance Test

Scenarios

Conclusion

Open Questions

Relevant Documents

No posts

Cookies consent