A Greenplum 4.1 installation handbook

One of the main advantages using Greenplum is that it gains power when it uses multiple nodes.
Horizontal scalability is a main feature of Greenplum.
Here is a compact handbook to install a multi-node Data Warehouse environment with Greenplum.

## Preparation steps
This little guide covers Greenplum 4.1 installation.
This is not intended to be a replacement for the official Install Guide, just a little handbook to keep on your desk.
You have to tune your Operating System a little bit before installing Greenplum.
That’s a very well documented procedure, I advice you to read it in the Install Guide at page 18.
## Installing Greenplum
First of all, you have to run the Greenplum installer script on master host, as root.
The installer script can be downloaded from greenplum community site: http://www.greenplum.com/community/downloads/database-ce
Make sure to download the correct version!
The installer script displays some question and the license, simply follow instructions on video.
Now comes the important part, you have tu run a special script, that setup Greenplum on a list of hosts for you. Awesome!
It simply copies the Greenplum installation from the actual host to a list of specified hosts (it cares about ssh keys exchanging and gpadmin user creation).
*Specified where?*
The important file here is hostfile_exkeys, it must contains hostnames for each host in your Greenplum system. For example:


this is enough to run gpseginstall, run in this way:

# gpseginstall -f hostfile_exkeys -u gpadmin -p yourpassword

## Creating directories
It’s time to create the master directory on master host.
Remember that real data are on segments, so no much space is needed here.
For example:

# mkdir /data/master
# chown gpadmin /data/master

You have to create that directory on your master segment as well.
Greenplum provides a useful script to do the job, it is called gpssh:

# gpssh -h master-segment-hostname -e 'mkdir /data/master'
# gpssh -h master-segment-hostname -e 'chown gpadmin /data/mast

Finally, you have to create data directories on all segments host, and tou can do that
all at once, thanks to gpssh.
Remember that real data goes there, so a lot of space is needed.
Create a file called


and place *only* segments hostnames in it. For example:


Now, run commands an all segments at once like this:

# gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/mirror'
# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/primary'
# gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/mirror'

## Conclusions
Here’s a list of steps to keep on your desk, I hope you will find it useful:
* Configure your Operating System for Greenplum (as written in Install Guide)
* Install Greenplum on master host
* Run gpseginstall to install Greenplum on other hosts
* Create master directory on the master
* Create the same directory on master segment (gpssh can help here)
* Create data directories on segments (gpssh can help here)
In the next article, we will see how to init and start the Greenplum Database we have just installed.
Stay tuned.

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *