Archives

Tagged ‘Amazon EC2‘

Greenplum, MapReduce, and Hadoop

If your job involves processing massive amounts of data you should familiarize yourself with Greenplum, MapReduce, and Hadoop.

With 6.5 Petabytes of data eBay runs the world’s largest data warehouse on Greenplum. Facebook runs a 2 PB warehouse on Hadoop. Impressive.

Both Greenplum and Hadoop make use of the MapReduce framework pioneered by Google.

You can run Hadoop on Amazon Elastic MapReduce to play around with the technology.

There have also been two Hadoop books published recently. I have ordered both of them and can’t wait to hold them in my hands.

Hadoop: The Definitive Guide

Pro Hadoop

No books on Greenplum, but they have some good whitepapers on their website.

Oracle on Windows in the EC2 cloud. Persist computer name across instance shutdown.

For various reasons it is important that your Windows computer name persists across instance shutdown on EC2 (listener, tnsnames.ora, loopback adapter etc.).

To achieve this you need to perform the following steps:

1. Connect to your instance via remote desktop

2. Change the computer name via control panel

3. Open windows explorer and browse to the EC2 config tool. By default this is located at C:\Program Files\Amazon\Ec2ConfigSetup. Double click Ec2ConfigServiceSettings.exe

4. On the General tab deselect Set Computer Name

ec2_1

5. Go to the Bundle tab. Deselect Sysprep

ec2_2

6. Log on to the AWS Management Console

7. Right click your instance and select Bundle Instance. This will take a while to complete.

8. Go to Bundle Tasks and select Register as an AMI.

The next time you launch your instance from your AMI you will see that the computer name you entered in step two has persisted across instance termination.

Note:

Before bundling your instance stop the Oracle Windows service. I typically set the Oracle service to start up manually. As the bundle task reboots the machine before taking an image it guarantees that the Oracle service is not started.