One of the main advantages using Greenplum is that it gains power when it uses multiple nodes.
Horizontal scalability is a main feature of Greenplum.
Here is a compact handbook to install a multi-node Data Warehouse environment with Greenplum.
## Preparation steps
This little guide covers Greenplum 4.1 installation.
This is not intended to be a replacement for the official Install Guide, just a little handbook to keep on your desk.
You have to tune your Operating System a little bit before installing Greenplum.
That’s a very well documented procedure, I advice you to read it in the Install Guide at page 18.
## Installing Greenplum
First of all, you have to run the Greenplum installer script on master host, as
The installer script can be downloaded from greenplum community site: http://www.greenplum.com/community/downloads/database-ce
Make sure to download the correct version!
The installer script displays some question and the license, simply follow instructions on video.
Now comes the important part, you have tu run a special script, that setup Greenplum on a list of hosts for you. Awesome!
It simply copies the Greenplum installation from the actual host to a list of specified hosts (it cares about ssh keys exchanging and
gpadmin user creation).
The important file here is
hostfile_exkeys, it must contains hostnames for each host in your Greenplum system. For example:
master-hostname master-segment-hostname segment-hostname-1 segment-hostname-2 ...
this is enough to run
gpseginstall, run in this way:
# gpseginstall -f hostfile_exkeys -u gpadmin -p yourpassword
## Creating directories
It’s time to create the
master directory on master host.
Remember that real data are on segments, so no much space is needed here.
# mkdir /data/master # chown gpadmin /data/master
You have to create that directory on your master segment as well.
Greenplum provides a useful script to do the job, it is called
# gpssh -h master-segment-hostname -e 'mkdir /data/master' # gpssh -h master-segment-hostname -e 'chown gpadmin /data/mast
Finally, you have to create data directories on all segments host, and tou can do that
all at once, thanks to
Remember that real data goes there, so a lot of space is needed.
Create a file called
and place *only* segments hostnames in it. For example:
Now, run commands an all segments at once like this:
# gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/primary' # gpssh -f hostfile_gpssh_segonly -e 'mkdir /data/mirror' # gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/primary' # gpssh -f hostfile_gpssh_segonly -e 'chown gpadmin /data/mirror'
Here’s a list of steps to keep on your desk, I hope you will find it useful:
* Configure your Operating System for Greenplum (as written in Install Guide)
* Install Greenplum on master host
gpseginstall to install Greenplum on other hosts
* Create master directory on the master
* Create the same directory on master segment (
gpssh can help here)
* Create data directories on segments (
gpssh can help here)
In the next article, we will see how to init and start the Greenplum Database we have just installed.