Mapreduce is a very trendy software framework. It has been introduced by Google (TM) in 2004. It is a large topic, and it is not possible to cover all of its aspetcs in a single blog article. This is a simple introduction to the _mapreduce_ usage in Greenplum 4.1.
In the first part of this article we have created a job, a database connection and defined the flow in Kettle. In the second part we’ll see how Kettle manages the data import from the CSV files.
In this article, I am going to upgrade a Greenplum cluster from version 4.0 to 4.1 using `gpmigrator`. `gpmigrator` is an utility shipped with Greenplum Community Edition whose purpose is to perform a live upgrade of an existing database.
Recently I have shown you how to perform a data import from a CSV file into a Greenplum database, using Talend Community Edition. In this article I’m going to perform the same task using another ETL tool, Kettle.
I’m going to demonstrate how it is possible to use dblink in Greenplum 220.127.116.11
In the first part of this tutorial, we have set up all the connections required for creating the job, now we can proceed with data import. Let’s drag and drop inside the visual editor an object named tMap. You can find it on the left, in the instruments palette, inside the “elaboration” folder.
In this article we are going to show you how to write PL/Java functions in Greenplum. I assume that you have a working Greenplum (or Greenplum Community Edition) at your disposal. In this example we will use version **4.0.4**, installed in /usr/local/greenplum-db-18.104.22.168 (which is the default location).
hen working with databases, one of the most common task is to load data from one or more CSV files. Several tools are available to achieve this task. Some are executed via command line, like COPY (using psql), some are more complex, like ETL systems. We will start today with Talend but, in the next weeks, […]
[*MADlib*](http://madlib.net) is an open-source library for scalable in-database analytics which targets the PostgreSQL and the Greenplum databases. MADlib version 0.2beta needs to be installed properly to follow this article, so we encourage you to read the [official documentation](http://github.com/madlib/madlib/wiki/Installation-Guide-%28v0.2beta%29) to install it in a Greenplum database. I’m going to show you how to perform Association Rules […]
Greenplum Community Edition is available in different flavours, including a VMWare virtual machine based on CentOS with all the fancy tools and the documentation already installed. This allows you to easily try and evaluate this powerful platform for data warehousing. [Greg Smith from our 2ndQuadrant team, recently explained how to install this image on Linux](http://www.greenplum.com/community/forums/showthread.php?486-Getting-Started-with-VMWare-on-Linux). […]