2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Gabriele's PlanetPostgreSQL3 / Installing Greenplum Single Node Edition on Amazon’s EC2
Gabriele Bartolini

Installing Greenplum Single Node Edition on Amazon’s EC2

March 23, 2010/2 Comments/in Gabriele's PlanetPostgreSQL, Greenplum /by Gabriele Bartolini

I have been thinking for a while now about adding Greenplum support to an open-source application for web analytics that I wrote a few years ago, which is called htMiner and uses PostgreSQL.

In order to do this, I need a multi-CPU environment. While still waiting to get our new servers installed here in our data centre in Italy, I decided to look at Amazon’s Elastic Compute Cloud (EC2) infrastructure. My intention is to do some benchmarking and spot the main differences in terms of performances between Greenplum Single Node Edition and PostgreSQL 8.4, my favourite DBMS.

If you wish to follow this article, you need to have an Amazon AWS account with a valid credit card. Do not worry, this test will only cost you a couple of dollars!

Greenplum SNE is a free version of the Greenplum database, one of the most advanced solutions for data warehousing and analytics, which is based on a shared nothing architecture and allows for data distribution and parallel processing on several nodes (servers).

The Single Node edition of Greenplum is a freely distributed version of Greenplum which can be installed on a single node. On a multi-processor architecture, Greenplum Single Node Edition allows to create multiple segments (usually one per core) and hence to take advantage of parallel processing. Greenplum Single Node Edition can be downloaded for free from the main website.

My intention is to install it on a Large Instance running CentOS Linux 5.4 on Amazon. EC2’s large instance has the following characteristics:

  • 7.5 GB of memory
  • 4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
  • 850 GB of local instance storage
  • 64-bit platform

I also decided to get a 10GB volume of Elastic Block Store (1 dollar a month), which I will format using the XFS file system. This volume will contain Greenplum data directories (this time I will try with just one single volume – next time I will try with a volume per segment).

The first step is to log into your Amazon AWS management console. Get your 10GB EBS volume and then launch a large instance using the ami-ebe4cf9f AMI file (AMI stands for Amazon Machine Image), a CentOS 5.4 image file distributed by RightScale for a 64 bit architecture. You may have a different code, as I use a Europe based server.

I then attach the created volume to the instance I just started. The management console informs me that the volume has been attached on /dev/sdf. I grab the public DNS information and connect to the server via ssh as root, using my EC2 identity.

I install the YUM packages for XFS support, by running:

yum install kmod-xfs.x86_64 xfsprogs xfsdump

I create a primary partition on /dev/sdf using fdisk and format it:

mkfs -t xfs /dev/sdf1 

I then add the entry to /etc/fstab:

/dev/sdf1 /greenplum xfs noatime 0 0

and mount the partition on the /greenplum mount point:

mkdir /greenplum
mount /greenplum

Download Greenplum’s Quickstart guide from the download area. Grab the URL of the 64bit RedHat installation of Greenplum and download it from the EC2 server using wget (or upload it from your computer using scp).

Follow the instructions on the quickstart guide about preparing your system to Greenplum (in particular kernel settings and limits).

Unzip the Greenplum’s zip file and execute the .bin file. Answer yes to all the questions and Greenplum at the end of the process is installed in the /usr/local/greenplum-db directory.

Create the gpadmin user and set the password:

useradd gpadmin
passwd gpadmin

Prepare the data directories for the master and the segments:

mkdir -p /greenplum/master
mkdir -p /greenplum/segment1
mkdir -p /greenplum/segment2
chown -R gpadmin:gpadmin /greenplum

Become gpadmin using the su command and include source /usr/local/greenplum-db/greenplum_path.sh into gpadmin’s ~/.bashrc file. Load these settings. Edit the ~/single_host_file file, add localhost to its contents and launch:

gpssh-exkeys -f ~/single_host_file

Create the ~/gp_init_config file with the following content:

ARRAY_NAME="Greenplum"
MACHINE_LIST_FILE=/home/gpadmin/single_host_file
SEG_PREFIX=gp
PORT_BASE=50000
declare -a DATA_DIRECTORY=(/greenplum/segment1 /greenplum/segment2)
MASTER_HOSTNAME=localhost
MASTER_DIRECTORY=/greenplum/master
MASTER_PORT=5432
ENCODING=UNICODE

Finally launch:

gpinitsystem -c ~/gp_init_config

At the end of the process, Greenplum SNE edition is installed on your Amazon’s EC2 server running CentOS 5.4. On this server you can test the solution at quite a reasonable price (I was on the server for 7 hours today and I spent only 3 dollars).

I will post a few more articles on this topic in the next few days, and hopefully I will be able to post the first benchmarks too. Enjoy!

Tags: amazon ec2
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
2 replies
  1. gambitg
    gambitg says:
    June 10, 2010 at 7:43 pm

    awesome article !!
    Few things you might want to clarify for the newbie:
    If you are accessing AWS from windows, do the following:
    Install Putty: required to make ssh connection
    Install PuttyGen: to convert the AWS private key to Putty private key (.ppk key)
    Install pscp: required to transfer greenplum zip file from your windows machine to linux instance created on AWS.
    Install link: http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
    Article to convert AWS key to putty key (required before you can fire ssh):
    http://docs.amazonwebservices.com/AWSEC2/latest/DeveloperGuide/index.html?generating-a-keypair.html
    Also remember to have Source(IP or Group) of 0.0.0.0/0 in ‘Security Groups’ in AWS console. (Connection method: SSH, protocol TCP, from and to port both are 22)
    Command for secure copy of greenplum .zip file from your windows to linux is:
    pscp -i green* [email protected]:/greenplum

    Reply
  2. sujeet
    sujeet says:
    September 3, 2012 at 4:49 am

    I have to install the GreenPlum DB on to my personal laptop. So can you please suggest me Hardware requirement for this.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
PGEast, Hardware Benchmarking, and the PG Performance Farm AMD, Intel, and PostgreSQL
Scroll to top
×