Archives

Tagged ‘google‘

BigQuery: Data Warehousing with Google?

Google has added two new products to their Labs. The first one is BiqQuery, which according to Google allows users to query trillions of records in an SQL dialect via a RESTful web service. If they get their pricing right on this then I can see Google becoming a top player in the data warehousing as a service space. This one could be quite interesting as unlike Hadoop or the other players in the nosql space do not support SQL. Only problem is that there are currently no query tools that will support BigQuery.

The other product they have added is a Prediction API. This one is a machine learning algorithm implemented via a RESTful web service.

Greenplum, MapReduce, and Hadoop

If your job involves processing massive amounts of data you should familiarize yourself with Greenplum, MapReduce, and Hadoop.

With 6.5 Petabytes of data eBay runs the world’s largest data warehouse on Greenplum. Facebook runs a 2 PB warehouse on Hadoop. Impressive.

Both Greenplum and Hadoop make use of the MapReduce framework pioneered by Google.

You can run Hadoop on Amazon Elastic MapReduce to play around with the technology.

There have also been two Hadoop books published recently. I have ordered both of them and can’t wait to hold them in my hands.

Hadoop: The Definitive Guide

Pro Hadoop

No books on Greenplum, but they have some good whitepapers on their website.