Mapreduce in Greenplum 4.1

Mapreduce is a very trendy software framework. It has been introduced by Google (TM) in 2004. It is a large topic, and it is not possible to cover all of its aspetcs in a single blog article. This is a simple introduction to the _mapreduce_ usage in Greenplum 4.1.

ETL with Kettle and Greenplum – Part Two: importing data

In the first part of this article we have created a job, a database connection and defined the flow in Kettle. In the second part we’ll see how Kettle manages the data import from the CSV files.  

Using gpmigrator in Greenplum 4.1.1

In this article, I am going to upgrade a Greenplum cluster from version 4.0 to 4.1 using `gpmigrator`. `gpmigrator` is an utility shipped with Greenplum Community Edition whose purpose is to perform a live upgrade of an existing database.

ETL with Kettle and Greenplum – Part one: setting up your job.

Recently I have shown you how to perform a data import from a CSV file into a Greenplum database, using Talend Community Edition. In this article I’m going to perform the same task using another ETL tool, Kettle.

Using dblink in Greenplum

I’m going to demonstrate how it is possible to use dblink in Greenplum

ETL with Talend and Greenplum – Part two: data import

In the first part of this tutorial, we have set up all the connections required for creating the job, now we can proceed with data import. Let’s drag and drop inside the visual editor an object named tMap. You can find it on the left, in the instruments palette, inside the “elaboration” folder.

Using PL/Java in Greenplum

In this article we are going to show you how to write PL/Java functions in Greenplum. I assume that you have a working Greenplum (or Greenplum Community Edition) at your disposal. In this example we will use version **4.0.4**, installed in /usr/local/greenplum-db- (which is the default location).

ETL with Talend and Greenplum – Part one: connections

hen working with databases, one of the most common task is to load data from one or more CSV files. Several tools are available to achieve this task. Some are executed via command line, like COPY (using psql), some are more complex, like ETL systems. We will start today with Talend but, in the next weeks, […]

Association rules with MADlib in Greenplum

[*MADlib*]( is an open-source library for scalable in-database analytics which targets the PostgreSQL and the Greenplum databases. MADlib version 0.2beta needs to be installed properly to follow this article, so we encourage you to read the [official documentation]( to install it in a Greenplum database. I’m going to show you how to perform Association Rules […]

How to test Greenplum Community Edition on VirtualBox

Greenplum Community Edition is available in different flavours, including a VMWare virtual machine based on CentOS with all the fancy tools and the documentation already installed. This allows you to easily try and evaluate this powerful platform for data warehousing. [Greg Smith from our 2ndQuadrant team, recently explained how to install this image on Linux]( […]