One pass SCD2 load: How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle

February 1, 2009

The other day I came across a useful new feature in the Merge statement for SQL Server 2008. This new feature outputs merged rows for further processing, something which up until now (Oracle 11.1.0.7) is not possible in Oracle. This extra functionality can be used to load a Slowly Changing Dimension Type 2 in one SQL statement. As a result you have only one pass over the data, less logical I/O, and as a result improved performance. If you are interested how exactly the Merge statement can be used in SQL Server 2008 to load an SCD2 have a look at this article. Recently, this has also featured in Kimball’s design tip 107. The whole thing was wrecking my head a bit: how can we achieve the same thing in Oracle? Finally, I came up with a solution. I have to admit it is not perfect, as this will only work if you can identify a changed record at source as either an Insert or an Update. So you need to know if the record is a new one or was just modified. In the vast majority of cases you should be able to have this information available at extract time. So if you extract your information from source via change data capture, timestamp created/timestamp modified, audit records , or Oracle Total Recall you should be good to go. Still, SQL Server 2008 has more powerful functionality here. I can see an enhancement request coming …

For this post we will rely on the good old SH schema (Oracle 11.1.0.7).
Let’s first create a sequence that we will be using later on for the surrogate key of our SCD2.

SQL&gt; CREATE SEQUENCE SH.SEQ_PRODUCTS
  2    START WITH 10100
  3    MAXVALUE 999999999999999999999999999
  4    MINVALUE 1
  5    NOCYCLE
  6    CACHE 20
  7    NOORDER;
Sequence created.

SQL> CREATE SEQUENCE SH.SEQ_PRODUCTS

2 START WITH 10100

3 MAXVALUE 999999999999999999999999999

4 MINVALUE 1

5 NOCYCLE

6 CACHE 20

7 NOORDER;

Sequence created.

Next we will create an SCD2 based on the existing sh.products dimension. We will also add a valid_ind indicator and populate the effective to date with ‘end of time’. To refresh your memory, an SCD2 as defined by Monsieur Kimball is used to track history in a DW. Each time an attribute in the underlying source of the dimension is modified, a new record with the updated attribute is created in the dimension. An SCD2 typically has three helper columns. The effective from and effective to date of the record, and a flag to indicate the most recent record.

SQL&gt; create table products_scd as select * from products;
Table created.
SQL&gt; alter table products_scd drop column prod_valid;
Table altered.
SQL&gt; alter table products_scd add  valid_ind number default 1;
Table altered.
SQL&gt; update products_scd set prod_eff_to = TO_DATE('31/12/9999','DD/MM/YYYY');
71 rows updated.

SQL> create table products_scd as select * from products;

Table created.

SQL> alter table products_scd drop column prod_valid;

Table altered.

SQL> alter table products_scd add valid_ind number default 1;

Table altered.

SQL> update products_scd set prod_eff_to = TO_DATE('31/12/9999','DD/MM/YYYY');

71 rows updated.

Now we will prepare and populate our staging table for the SCD2 load. I have called this products_source. We will be creating 6 records in this staging table. 5 records represent updates to existing records, and 1 record is a new insert.

SQL&gt; create table products_source as select a.*,'U' dml_type from products a where rownum &lt; 6;
Table created.
SQL&gt; alter table products_source drop column prod_id;
Table altered.
SQL&gt; alter table products_source drop column prod_eff_from;
Table altered.
SQL&gt; alter table products_source drop column prod_eff_to;
Table altered.
SQL&gt; alter table products_source drop column prod_valid;
Table altered.

SQL> create table products_source as select a.*,'U' dml_type from products a where rownum < 6;

Table created.

SQL> alter table products_source drop column prod_id;

Table altered.

SQL> alter table products_source drop column prod_eff_from;

Table altered.

SQL> alter table products_source drop column prod_eff_to;

Table altered.

SQL> alter table products_source drop column prod_valid;

Table altered.

The ‘U’ attribute in the dml_type column marks these records as updates.
Next we update the prod_min_price to simulate a reduction in price and we also create a brand new record.

SQL&gt; update products_source set prod_min_price = prod_min_price*.9
  2  ;
5 rows updated.
SQL&gt; INSERT INTO products_source VALUES ('yPhone','Mobile','Mobile',-1,'Mobile','Phone',-1,'Phone',1,'U','P',1,'-',200,200,'-',1,NULL,'I');
1 row created.
SQL&gt; commit;
Commit complete.

SQL> update products_source set prod_min_price = prod_min_price*.9

2 ;

5 rows updated.

SQL> INSERT INTO products_source VALUES ('yPhone','Mobile','Mobile',-1,'Mobile','Phone',-1,'Phone',1,'U','P',1,'-',200,200,'-',1,NULL,'I');

1 row created.

SQL> commit;

Commit complete.

We should now have the following records in our products_source staging table.

SQL&gt; select prod_name,prod_min_price,dml_type from products_source;
PROD_NAME                                          PROD_MIN_PRICE D
-------------------------------------------------- -------------- -
5MP Telephoto Digital Camera                               728.99 U
17" LCD w/built-in HDTV Tuner                              809.99 U
Envoy 256MB - 40GB                                         809.99 U
Y Box                                                      242.99 U
Mini DV Camcorder with 3.5" Swivel LCD                     890.99 U
yPhone                                                        200 I
6 rows selected.

SQL> select prod_name,prod_min_price,dml_type from products_source;

PROD_NAME PROD_MIN_PRICE D

-------------------------------------------------- -------------- -

5MP Telephoto Digital Camera 728.99 U

17" LCD w/built-in HDTV Tuner 809.99 U

Envoy 256MB - 40GB 809.99 U

Y Box 242.99 U

Mini DV Camcorder with 3.5" Swivel LCD 890.99 U

yPhone 200 I

6 rows selected.

Finally, we merge our staging table into our prodcuts_scd dimension table.

SQL&gt; MERGE INTO products_scd a USING (
  2     SELECT
  3        CASE WHEN valid_ind = 1 THEN '1'||a.prod_name||'1' ELSE prod_name END as prod_name,
  4        a.prod_desc,
  5        a.prod_subcategory,
  6        a.prod_subcategory_id,
  7        a.prod_subcategory_desc,
  8        a.prod_category,
  9        a.prod_category_id,
 10        a.prod_category_desc,
 11        a.prod_weight_class,
 12        a.prod_unit_of_measure,
 13        a.prod_pack_size,
 14        a.supplier_id,
 15        a.prod_status,
 16        a.prod_list_price,
 17        a.prod_min_price,
 18        a.prod_total,
 19        a.prod_total_id,
 20        a.prod_src_id,
 21        a.dml_type,
 22        NVL(b.valid_ind,-1) AS valid_ind
 23     FROM
 24        products_source a left outer join (
 25           SELECT
 26              level as valid_ind,
 27              'U' as dml_type
 28           FROM
 29              dual CONNECT BY level &lt;= 2) b on (a.dml_type = b.dml_type)
 30  ) b ON (a.prod_name = b.prod_name )
 31  WHEN MATCHED THEN UPDATE SET
 32     a.valid_ind = 0,
 33     a.prod_eff_to = SYSDATE
 34  WHEN NOT MATCHED THEN INSERT (
 35        a.prod_id,
 36        a.prod_name,
 37        a.prod_desc,
 38        a.prod_subcategory,
 39        a.prod_subcategory_id,
 40        a.prod_subcategory_desc,
 41        a.prod_category,
 42        a.prod_category_id,
 43        a.prod_category_desc,
 44        a.prod_weight_class,
 45        a.prod_unit_of_measure,
 46        a.prod_pack_size,
 47        a.supplier_id,
 48        a.prod_status,
 49        a.prod_list_price,
 50        a.prod_min_price,
 51        a.prod_total,
 52        a.prod_total_id,
 53        a.prod_src_id,
 54        a.prod_eff_from,
 55        a.prod_eff_to,
 56        a.valid_ind
 57  )
 58  VALUES (
 59        seq_products.nextval,
 60        CASE WHEN b.valid_ind = 1 THEN SUBSTR(b.prod_name,2,LENGTH(b.prod_name)) ELSE b.prod_name END,
 61        b.prod_desc,
 62        b.prod_subcategory,
 63        b.prod_subcategory_id,
 64        b.prod_subcategory_desc,
 65        b.prod_category,
 66        b.prod_category_id,
 67        b.prod_category_desc,
 68        b.prod_weight_class,
 69        b.prod_unit_of_measure,
 70        b.prod_pack_size,
 71        b.supplier_id,
 72        b.prod_status,
 73        b.prod_list_price,
 74        b.prod_min_price,
 75        b.prod_total,
 76        b.prod_total_id,
 77        b.prod_src_id,
 78        SYSDATE,
 79        TO_DATE('31/12/9999','DD/MM/YYYY'),
 80        1
 81  )    ;
11 rows merged.
SQL&gt; COMMIT;
Commit complete.

SQL> MERGE INTO products_scd a USING (

2 SELECT

3 CASE WHEN valid_ind = 1 THEN '1'||a.prod_name||'1' ELSE prod_name END as prod_name,

4 a.prod_desc,

5 a.prod_subcategory,

6 a.prod_subcategory_id,

7 a.prod_subcategory_desc,

8 a.prod_category,

9 a.prod_category_id,

10 a.prod_category_desc,

11 a.prod_weight_class,

12 a.prod_unit_of_measure,

13 a.prod_pack_size,

14 a.supplier_id,

15 a.prod_status,

16 a.prod_list_price,

17 a.prod_min_price,

18 a.prod_total,

19 a.prod_total_id,

20 a.prod_src_id,

21 a.dml_type,

22 NVL(b.valid_ind,-1) AS valid_ind

23 FROM

24 products_source a left outer join (

25 SELECT

26 level as valid_ind,

27 'U' as dml_type

28 FROM

29 dual CONNECT BY level <= 2) b on (a.dml_type = b.dml_type)

30 ) b ON (a.prod_name = b.prod_name )

31 WHEN MATCHED THEN UPDATE SET

32 a.valid_ind = 0,

33 a.prod_eff_to = SYSDATE

34 WHEN NOT MATCHED THEN INSERT (

35 a.prod_id,

36 a.prod_name,

37 a.prod_desc,

38 a.prod_subcategory,

39 a.prod_subcategory_id,

40 a.prod_subcategory_desc,

41 a.prod_category,

42 a.prod_category_id,

43 a.prod_category_desc,

44 a.prod_weight_class,

45 a.prod_unit_of_measure,

46 a.prod_pack_size,

47 a.supplier_id,

48 a.prod_status,

49 a.prod_list_price,

50 a.prod_min_price,

51 a.prod_total,

52 a.prod_total_id,

53 a.prod_src_id,

54 a.prod_eff_from,

55 a.prod_eff_to,

56 a.valid_ind

57 )

58 VALUES (

59 seq_products.nextval,

60 CASE WHEN b.valid_ind = 1 THEN SUBSTR(b.prod_name,2,LENGTH(b.prod_name)) ELSE b.prod_name END,

61 b.prod_desc,

62 b.prod_subcategory,

63 b.prod_subcategory_id,

64 b.prod_subcategory_desc,

65 b.prod_category,

66 b.prod_category_id,

67 b.prod_category_desc,

68 b.prod_weight_class,

69 b.prod_unit_of_measure,

70 b.prod_pack_size,

71 b.supplier_id,

72 b.prod_status,

73 b.prod_list_price,

74 b.prod_min_price,

75 b.prod_total,

76 b.prod_total_id,

77 b.prod_src_id,

78 SYSDATE,

79 TO_DATE('31/12/9999','DD/MM/YYYY'),

80 1

81 ) ;

11 rows merged.

SQL> COMMIT;

Commit complete.

For the records updated at source we need to both create a new record for the updated attributes and update the existing record to reflect the changes in effective dates and the valid indicator. For the new records at source (in our case there is just one) we need to create new records in the SCD2.
On lines 24-29 we effectively duplicate the records that were update at source
On line 3 we prefix the prod_name for the first set of updates with the number 1, while the second set of updates does not undergo this treatment. The resultset we will now merge should look as follows:

SQL&gt;    SELECT
  2        CASE WHEN valid_ind = 1 THEN '1'||a.prod_name||'1' ELSE prod_name END as prod_name,
  3        a.dml_type,
  4        NVL(b.valid_ind,-1) AS valid_ind
  5     FROM
  6        products_source a left outer join (
  7           SELECT
  8              level as valid_ind,
  9              'U' as dml_type
 10           FROM
 11              dual CONNECT BY level &lt;= 2) b on (a.dml_type = b.dml_type) ;
PROD_NAME                                            D  VALID_IND
---------------------------------------------------- - ----------
1Mini DV Camcorder with 3.5" Swivel LCD1             U          1
1Y Box1                                              U          1
1Envoy 256MB - 40GB1                                 U          1
117" LCD w/built-in HDTV Tuner1                      U          1
15MP Telephoto Digital Camera1                       U          1
Mini DV Camcorder with 3.5" Swivel LCD               U          2
Y Box                                                U          2
Envoy 256MB - 40GB                                   U          2
17" LCD w/built-in HDTV Tuner                        U          2
5MP Telephoto Digital Camera                         U          2
yPhone                                               I         -1
11 rows selected.

SQL> SELECT

2 CASE WHEN valid_ind = 1 THEN '1'||a.prod_name||'1' ELSE prod_name END as prod_name,

3 a.dml_type,

4 NVL(b.valid_ind,-1) AS valid_ind

5 FROM

6 products_source a left outer join (

7 SELECT

8 level as valid_ind,

9 'U' as dml_type

10 FROM

11 dual CONNECT BY level <= 2) b on (a.dml_type = b.dml_type) ;

PROD_NAME D VALID_IND

---------------------------------------------------- - ----------

1Mini DV Camcorder with 3.5" Swivel LCD1 U 1

1Y Box1 U 1

1Envoy 256MB - 40GB1 U 1

117" LCD w/built-in HDTV Tuner1 U 1

15MP Telephoto Digital Camera1 U 1

Mini DV Camcorder with 3.5" Swivel LCD U 2

Y Box U 2

Envoy 256MB - 40GB U 2

17" LCD w/built-in HDTV Tuner U 2

5MP Telephoto Digital Camera U 2

yPhone I -1

11 rows selected.

In the Merge statement we merge the source and target resultset on the prod_name column. This matches now for the first set of the updated records. For these we update the valid_ind and the product effective to date on lines 32-33.
The other records are created as new records in our target dimension. As part of the insert, we strip the prefixed records of their attachment.
The resultset in our SCD2 should look as follows:

SQL&gt; select * from (
  2     select prod_name, prod_min_price,prod_eff_from,prod_eff_to,valid_ind  from products_scd order by prod_eff_from desc
  3  ) where rownum &lt; 12;
PROD_NAME                                          PROD_MIN_PRICE PROD_EFF_ PROD_EFF_  VALID_IND
-------------------------------------------------- -------------- --------- --------- ----------
Mini DV Camcorder with 3.5" Swivel LCD1                    890.99 01-FEB-09 31-DEC-99          1
Y Box1                                                     242.99 01-FEB-09 31-DEC-99          1
17" LCD w/built-in HDTV Tuner1                             809.99 01-FEB-09 31-DEC-99          1
yPhone                                                        200 01-FEB-09 31-DEC-99          1
Envoy 256MB - 40GB1                                        809.99 01-FEB-09 31-DEC-99          1
5MP Telephoto Digital Camera1                              728.99 01-FEB-09 31-DEC-99          1
5MP Telephoto Digital Camera                               899.99 01-JAN-98 01-FEB-09          0
17" LCD w/built-in HDTV Tuner                              999.99 01-JAN-98 01-FEB-09          0
Envoy 256MB - 40GB                                         999.99 01-JAN-98 01-FEB-09          0
Y Box                                                      299.99 01-JAN-98 01-FEB-09          0
Mini DV Camcorder with 3.5" Swivel LCD                    1099.99 01-JAN-98 01-FEB-09          0
11 rows selected.

SQL> select * from (

2 select prod_name, prod_min_price,prod_eff_from,prod_eff_to,valid_ind from products_scd order by prod_eff_from desc

3 ) where rownum < 12;

PROD_NAME PROD_MIN_PRICE PROD_EFF_ PROD_EFF_ VALID_IND

-------------------------------------------------- -------------- --------- --------- ----------

Mini DV Camcorder with 3.5" Swivel LCD1 890.99 01-FEB-09 31-DEC-99 1

Y Box1 242.99 01-FEB-09 31-DEC-99 1

17" LCD w/built-in HDTV Tuner1 809.99 01-FEB-09 31-DEC-99 1

yPhone 200 01-FEB-09 31-DEC-99 1

Envoy 256MB - 40GB1 809.99 01-FEB-09 31-DEC-99 1

5MP Telephoto Digital Camera1 728.99 01-FEB-09 31-DEC-99 1

5MP Telephoto Digital Camera 899.99 01-JAN-98 01-FEB-09 0

17" LCD w/built-in HDTV Tuner 999.99 01-JAN-98 01-FEB-09 0

Envoy 256MB - 40GB 999.99 01-JAN-98 01-FEB-09 0

Y Box 299.99 01-JAN-98 01-FEB-09 0

Mini DV Camcorder with 3.5" Swivel LCD 1099.99 01-JAN-98 01-FEB-09 0

11 rows selected.

Back to Blog

Cookie	Duration	Description
__cfruid	session	Cloudflare sets this cookie to identify trusted web traffic.
cookielawinfo-checkbox-marketing	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Marketing".
cookielawinfo-checkbox-necessary	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-preferences	1 month	This cookie is set by the GDPR Cookie Consent plugin to check if the user has given consent to use cookies under the "Preferences" category.
cookielawinfo-checkbox-statistics	1 month	This cookie is set by the GDPR Cookie Consent plugin to store the user consent for the cookies in the category "Statistics".
cookielawinfo-checkbox-unclassified	1 month	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Unclassified".
CookieLawInfoConsent	1 month	Records the default button state of the corresponding category & the status of CCPA. It works only in coordination with the primary cookie.
csrftoken	1 year	This cookie is associated with Django web development platform for python. Used to help protect the website against Cross-Site Request Forgery attacks

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
bcookie	2 years	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser ID.
bscookie	2 years	LinkedIn sets this cookie to store performed actions on the website.
lang	session	LinkedIn sets this cookie to remember a user's language setting.
li_gc	2 years	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
mgref	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
mgrefby	1 year	This cookie is set by Eventbrite to deliver content tailored to the end user's interests and improve content creation. It is also used for event-booking purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.

Cookie	Duration	Description
G	1 year	Cookie used to facilitate the translation into the preferred language of the visitor.
SERVERID	session	This cookie is set by Slideshare's HAProxy load balancer to assign the visitor to a specific server.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_ga_7H38LVR4Z5	2 years	This cookie is installed by Google Analytics.
_gat_gtag_UA_44804396_1	1 minute	Set by Google to distinguish users.
_gat_UA-44804396-1	1 minute	A variation of the _gat cookie set by Google Analytics and Google Tag Manager to allow website owners to track visitor behaviour and measure site performance. The pattern element in the name contains the unique identity number of the account or website it relates to.
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
CONSENT	2 years	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.
SIDCC	6 Months	The "SIDCC" cookie is used as security measure to protect users data from unauthorised access
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt.innertube::nextId	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen.

Cookie	Duration	Description
AN	1 month
AS	session
ebEventToTrack	1 month
eblang	1 year
SNID	2 years	This cookie is set by the Google. This cookie is used by the map which helps visitors to identify and reach the facility.
SP	session
SS	session

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

From Code to Clarity: Visualizing SQL code for Documentation and Debugging

One pass SCD2 load: How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle

One pass SCD2 load: How to load a Slowly Changing Dimension Type 2 with one SQL Merge statement in Oracle

Related Articles

Loading and querying XML documents in the Oracle Database

Converting XML documents from an Oracle CLOB to a relational format

The Snowflake data platform for data engineering, data preparation, and ETL

Cookies consent