CDC & Streams Performance Test
Objective of the Streams Performance Test was to identify risks and bottlenecks for CDC and Streams. The following items were of particular interest to us:
- Identify bottlenecks for batch jobs with a low commit frequency, i.e. transactions that modify large numbers of records without committing frequently. We needed to know this because we have low commit frequency batch jobs running on the transactional source system during the day. Because as per Oracle Metalink 335516.1 it is recommended to “decrease transaction sizes to less than 1000 LCRs”. We wanted to know how long it would take for batch transactions of 10K, 100K and 1000K records to be applied to the change tables and where bottlenecks would materialise. We also wanted to know the influence of making changes to the streams_pool_size, the apply server parallelism, and the TXN_THRESHOLD apply parameters.
- We started off with a batch insert (one transaction and one commit) of 9500 records into a source table (random data in a copy of dba_objects). Parameters were Apply Parallelism 4, TXN_THRESHOLD 10000, Streams Pool = 0 (Automatic). It took 66 secs from end to end to load the records into the change table. We had no buffered queue spills during the capture process and we did not have any spills as part of the dequeue.
- In the second scenario we batch inserted 99000 records into our source table. Parameters were Apply Parallelism 2, TXN_THRESHOLD 100000, Streams Pool = 0 (Automatic). This took 300s to complete from end to end. The apply reader finished his job after 165s. We had 0 records spilled to disk by the apply reader, but a massive 75K spilled messages from the buffer queue (step between capture enqueue and start of apply reader). So what must have happened was that while we did not allow the apply reader spilling messages to disk (by setting TXN_THRESHOLD to 100000) the capture process spilled the messages from the buffer to disk as the apply reader was too slow to dequeue and process the messages. So it seems that the bottleneck here was the amount of memory allocated to the streams pool. Because of low streams pool memory the apply reader had to read messages from the spill table rather than from disk.
- As a result we changed the size of the streams pool from automatic to 384M, leaving the rest of the setup as per scenario two. This reduced the completion time to 194s. We still had 64K spilled messages from the buffer queue. A reduction of 11K.
- In scenario four we increased the streams_pool_size to 480M. This again reduced the completion time by another 16s to a total of 178s. This shows that increasing the streams_pool_size increased performance and fewer messages were spilled to disk. As we did not have more memory available to increase the size of the streams pool we had to stop our test here. A quick look at the v$streams_pool_advice view revealed, however, that even an allocation of 960M memory to the streams pool would have resulted in message spill to disk.
- In scenario five then we reset the streams_pool_size to 0 (automatic allocation), changed apply server parallelism to 4, and changed the TXN_THRESHOLD back to 10000, thus allowing the apply reader to spill messages to disk. Again we batch inserted 99K records into our source table. This time the records were applied to our change tables in 49s. A huge performance boost. No messages were spilled by the capture process and as per TXN_THRESHOLD setting all messages from the apply reader were spilled. As a result the apply server had to process all messages from disk.
- Encouraged by this result we batch inserted 800K into our source table, leaving our parameters as per scenario five. However, this took 660s in total to complete. Again all messages were spilled to disk by the apply reader. As a result the capture process and apply reader were finished with their stuff after about 120s. It then took the apply server over 8 minutes to apply the transaction to the change table. As we were using a low commit frequency in our insert at source, parallelism could not be used by the apply server. So one single apply server had to deal with the load.
- In our final scenario we left all parameters as per scenario six. However, we did a high commit frequency insert of 1M records into our source table, committing after each insert. It took 190s to complete. No messages were spilled to disk by the apply reader, few messages were spilled to disk by the capture process, and the apply server used all 4 parallel apply servers.
- Will a table that is not part of a change set and that is heavily modified change capture time.
- We batch inserted 100K records into our source table. It took 10s for the capture to complete. So no impact of this insert on capture process.
- We inserted 100K records into our source table with a high commit frequency (one commit per transaction). Again this took 10s to complete.
- We did not insert any records into our source table and switched logs. And again this took just 10s to complete.
- DML activity on a source table that is not part of a change set does not negatively affect performance.
- Performance for a batch job with a commit frequency of 10K was very acceptable.
- Even for 99K records for a batch job with a commit frequency of 99K performance was still acceptable.
- Performance for a batch job with a commit frequency of 800K was not acceptable.
- Bottlenecks were spilling of either capture process or apply reader process to disk because of low memory.
- As per v$streams_pool_advice, 980M of memory for the streams pool would not have been enough to avoid spilling for a batch insert of 99K records. So a fairly big allocation of memory would be needed to avoid spilling for transactions in batch mode of that size.
- Only one apply server can work on one transaction. So if you have a large transaction, e.g. batch job with low commit frequency the apply server can’t divide and conquer.
- It would have been interesting to see performance of a 100K batch insert at source with an allocation of 2G streams pool. Anyone done any similar tests?
- I don’t have an answer (or a theory) of why a spilling to disk by the capture process degrades performance by about a factor of four in comparison to the spilling to disk by the apply reader. Maybe the LCR takes up more space than a transaction? Anyone any ideas?
Tests performed on server with 1.5G PGA, 1.5G SGA and 8 CPUs. Both source and downstream target server on Oracle 10.2.0.3.
- Chapter 16 Data Warehousing Guide
- Streams concept and administration
- Streams Performance White Paper
- Note 273674.1 – Streams Configuration Report and Health Check Script
- Note 335516.1 – Streams Performance Recommendations
- Note 418755.1 – 10.2.0.x.x Streams Recommendations
- Note 437838.1 – Recommended Patches for Streams
- Note 290605.1 Oracle Streams STRMMON Monitoring Utility
- Note 238455.1 Streams Supported and Unsupported Datatypes
- Note 313748.1 Using Automatic Statistics Collection In A Streams