2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Greenplum3 / ETL with Talend and Greenplum – Part one: connections
Giulio Calacoci

ETL with Talend and Greenplum – Part one: connections

September 16, 2011/2 Comments/in Greenplum /by Giulio Calacoci
hen working with databases, one of the most common task is to load data from one or more CSV files.
Several tools are available to achieve this task. Some are executed via command line, like COPY (using psql), some are more complex, like ETL systems.
We will start today with Talend but, in the next weeks, we will proceed with Kettle (Pentaho Data Integration).

In this tutorial we will learn how to import data on a Greenplum database from two CSV files and how to execute a lookup operation, using Talend. For the sake of simplicity, we assume that:

  • you have a very simple test database in Greenplum with two tables and a logical 1:N relationship between them: we chose states and users (a user can live in just one state)
  • you have two CSV files with the same data structure, one for the states and one for the users

The objective of this article is to demonstrate the concepts using a simple example, so that you can then apply the same techniques to more complex scenarios.

The first step is to download Talend Open Studio ( you can find it on the Talend Site ) and to install it.

Then we need to create a connection to Greenplum. From the Metadata section of Talend’s interface, add a Greenplum connection and insert the credentials. When done, a new object of type DbConnection will be created.

Right-click on it then select “Retrieve Schema“, so that Talend can gather all the needed information about the structure of the users and states tables.

Now it’s time to add the files that contain the data. Immediately below the “Db Connections” icon, you can find the group of delimited files. Click on “Create file delimited“.

In the first screenshot of the wizard, you will be asked to insert a name and a description for the resource being created. In step two, you need to locate the file and set the encoding.
In the third step, depicted in the picture below, you need to define row parsing methods.
You can select the encoding (1), field and row separators (2) and, if necessary, header rows or rows to be skipped (3) – either at the beginning or the end of the file.
It is also possible to ask Talend to auto-detect the structure of the file, by using the first row as header (4).

Click on “Next” then check that everything is fine and finish the process. You will have your file correctly associated to the project. Perform the above operation for both the users and states files. Once you have done, you can proceed with the creation of a job. Drag both files on the job, then select tFileInputDelimited as file type.

From the previously created Greenplum, drag the two tables (in our case states and users) in the editor, then select tGreenplumOutput as type.

That’s it for now. We have been able to create a connection, add two files and two tables in our project. We will continue with the loading phase in the second part of this article.
Tags: etl, greenplum, talend open studio
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
2 replies
  1. Lalit
    Lalit says:
    June 3, 2018 at 4:03 pm

    Hi Team,

    Please help me to set up talend job to read data from Greenplum tables and load it to file(FTP_ server).

    Thanks,
    Lalit

    Reply
    • craig.ringer
      craig.ringer says:
      November 13, 2018 at 2:28 am

      The blog isn’t really a general Q&A service. If you’re looking for support please reach out to [email protected] or see the website for support services and plans. You may also find Stack Overflow a useful resource.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Association rules with MADlib in Greenplum Using PL/Java in Greenplum
Scroll to top
×