Data warehousing for free! Terabyte sized data warehouse and business intelligence without license costs

October 26, 2009

This is no joke. Greenplum on 19 October announced a free single node edition of its analytical database.
For those of you who haven’t heard about Greenplum, they are a provider of an MPP database software that runs on commodity hardware (unlike some its competitors such as Teradata, Netezza, or recently Oracle with Exadata). The database is based on open source database software PostgreSQL, however, is closed source itself.
Features of the database include Massively Parallel Processing, redundancy, compression, row-level or column oriented data storage, compression, partitioning, SQL standard including SQL 2003 OLAP (analytic functions etc.), MapReduce support, ODBC & JDBC support.
So what restrictions are there for the single node edition. Obviously you are only allowed to run it on a single node. Below is an extract from the Greenplum datasheet:

  • Unlimited production usage on a single commodity x86 server using up to 2 CPU sockets (and unlimited CPU cores), or in a single virtual machine using up to 8 virtual CPU cores.
  • Fully parallel SQL and MapReduce processing leverages multi-core parallel-processing engine for every query.
  • No storage capacity cap: from GBs to 10s of TBs.
  • Hybrid row and column-oriented processing.
  • Free community support as well as a low-cost, paid support option.

Of course, the full power of Greenplum’s shared nothing architecture only materialises with multiple nodes. But the company says that you can expand seamlessly from a single-node to multi-node architecture.
Documentation is installed when you install the single-node edition. Couple of thousand pages long but tiny compared to the beast you get with the Oracle database.
Use cases
I can see two immediate use cases for this:
(1) Greenplum themselves promote this offering as part of their Enterprise Data Cloud. They have a vision of self service data marts. Based on this, data analysts can go to the Enterprise Data Warehouse and via interfaces create their own data marts for in depth analysis outside the EDW. Have a look at Curt Monash’s excellent article on the future of data marts.
(2) I can see another use case for departmental solutions. You could set up your first couple of subject areas or data marts on a single node machine and if you reach limits on this single node, add more nodes to scale out. Or if you don’t reach this limit just stay on this setup forever.
So why are they giving away data warehouses for free? In another article, Curt Monash gives the following reasons:

  • Adding value to its Enterprise Data Cloud story
  • Seeding the market for future enterprise sales
  • Depriving competitors of revenue, perhaps at enterprises too small to ever be paying Greenplum customers

Combine the Greenplum offering with Microstrategy’s free Reporting Suite, and you have a best of breed departmental solution for zilch.
The following restrictions apply to the Microstrategy BI tool:
– 100 named users for the frontend of the BI tool and the BI server
– Two named users for the semantic layer module
– Limited to one CPU. I presume it is limited to one CPU core, but this is not clear from the website
– Two named users for the other modules in their BI suite, e.g. OLAP reporting etc.
Have a look at their website for a full set of features and conditions.
For the right set of requirements the above is an attractive and very cost-effective combination. On top of that it is scalable. So if you grow out of it just scale out and add on.