Data Warehousing Books: Design and architecture

October 31, 2009

In another post I have covered data warehousing books in the world of Oracle. We’ve also had a look at data warehousing and business intelligence books for project management and business analysis. Today we will look at data warehousing and business intelligence books that look at the technical design and architecture of a data warehouse solution.

Must Have

DW 2.0: The Architecture for the Next Generation of Data Warehousing: Bill Inmon revisits his data warehouse architecture. Addresses the following issues: Real-time BI, unstructured data, the enterprise data warehouse and change, the data life cycle, time variance of data. Very useful from a conceptual point of view, but not enough detail.

The Data Warehouse Toolkit- The Complete Guide to Dimensional Modelling. My first book on data warehousing. Still valuable today. Great for dimensional modelling data marts or small non-realtime Enterprise Data Warehouses based on Kimball’s conformed dimensions. It also has a good overview on industry specific data model patterns in a dimensional context. A must have.

The Data Model Resource Books Vol 1-3: The books describe fundamental data modeling patterns that can be applied and reused across the enterprise. If you are assigned the task of modelling an Enterprise Data Warehouse, these books give you great insight into best practices in data modelling. Volume 2 offers industry specific data model patterns and provides invaluable information to better understand the issues at hand in a particular industry. Personally I find it that you should actually start with volume 3 as this is the most generic of the three books. Also if you only get one of the books get volume 3.

If you have a requirement around near-real time data warehousing and operational business intelligence I recommend to look into Dan Linstedt’s data vault modelling techniques. The Business of Data Vault Modeling will get you started.

Some more recent additions to the data warehouse architecture league of books includes Building and Maintaining a Data Warehouse and Advanced Data Warehouse Design. The first of these walks us through all the technical areas of a data warehouse project: source system analysis, database design, bi reporting, data quality, metadata. In my opinion, the best chapter is on data integration and ETL. There are very few dedicated ETL books out there and this is one of the few that touches on the subject, albeit from a high level. In Advanced Data Warehouse Design the authors discuss the shortcomings of existing data warehouse implementations focusing mainly on spatial and temporal data, e.g. the shortcomings of slowly changing dimensions when capturing changes over time. They propose a truly temporal and spatial data warehouse. Examples are given in MS SQL Analysis Service (temporal) and Oracle OLAP (temporal and spatial).

To my knowledge the only book out there dedicated to the physical design of databases is Physical Database Design: the database professional’s guide to exploiting indexes, views, storage, and more. Most of the stuff covered here is for advanced users. It covers Oracle, DB2, SQL Server, and for some of the MPP stuff Teradata. Personally I found the chapter on physical design for a shared nothing architecture, and the chapter on hardware (CPU architecture, disks, server sizing etc.) the most useful.

Dr. Ronnie Abrahiem, Software Engineer at CIBER has recently published a book on combining SOA and data warehousing in a near-real time environment. This looks quite interesting but I haven’t read the book myself. It has the rather long title Data Warehousing with Service-oriented Architecture: Designing and Implementing Prototype Models For an Integration of Near-Real-Time Data Warehousing Architecture with Service-oriented Architecture. I am currently working on a project where we want to integrate a SOA based MDM solution with the data warehouse. The book may offer some interesting insights around this.

Should Have

If you have a lot of aggregate tables in your warehouse I recommend to have a look at Mastering Data Warehouse Aggregates for a formalised methodology and some really useful tips and tricks around an aggregate navigator.

Another recent addition to data warehouse design books is Data Warehouse Design: Modern Principles and Methodologies. Very useful chapter on ETL and quite affordable.

Could Have

Data Warehouse Design Solutions. This is useful as a second reference for industry specific dimensional models. However, it can not replace Kimball’s original book on the subject.

Clickstream Data Warehousing. If you are implementing a data warehouse for web analytics you should have a look here. However, in light of the explosion of data volumes and with Hadoop and MapReduce at hand this one is slightly obsolete.