Using dblink in Greenplum

I’m going to demonstrate how it is possible to use dblink in Greenplum 4.0.4.0


What’s dblink?
——————
dblink is a PostgreSQL contrib module that allows to execute queries on another database.
Current PostgreSQL’s architeture requires users to connect to a specific database on a server. Therefore, it is not possible to natively perform an SQL query on a different database.
Normally we discourage using dblink to query remote databases.
Databases are self-contained objects in PostgreSQL, and they should be designed with that in mind.
We always advice users to work with schemas. But sometimes, this might not be enough.
Anyway, I’m writing this article to show you how PostgreSQL and Greenplum are related.
Requisities
————–
You need a working copy of Greenplum 4.0.4 and a directory containing the PostgreSQL 8.2 source code, I’ve tested it on a Linux operating system (CentOS 5 to be exact).
For this example, I assume that Greenplum is installed in /usr/local/greenplum-db-4.0.4.0 and PostgreSQL source directory is /home/gpadmin/postgresql-8.2.22.
I’m using PostgreSQL 8.2 because Greenplum is based on fork on that version.
You can obtain PostgreSQL 8.2 source from http://www.postgresql.org/ftp/source/v8.2.22/ or directly from git, please refer to http://wiki.postgresql.org/wiki/Working\_with\_git for instruction on how to use git with PostgreSQL.
Installing dblink
——————-
Installing dblink is as easy as installing a contrib module in PostgreSQL.
* as gpadmin user, source the greenplum_path.sh file:


$ source /usr/local/greenplum-db-4.0.4.0/greenplum_path.sh

You should have this line in your .bashrc file after the installation process.
* Check that pg_config --pgxs is the Greenplum one:


$ pg_config --pgxs
/usr/local/greenplum-db-4.0.4.0/lib/postgresql/pgxs/src/makefiles/pgxs.mk

* Enter the dblink directory:


$ cd /home/gpadmin/postgresql-8.2.22/contrib/dblink/

* Edit the Makefile and add -w to PG_CPPFLAGS, so that line number four looks like this:


PG_CPPFLAGS = -I$(libpq_srcdir) -w

This option tells gcc to ignore all warnings (that’s a _dirty_ hack but mandatory).
* Install the module directly in a database:


$ createdb my_db
$ make USE_PGXS=1 install
$ psql -f dblink.sql my_database

At this time, my_database contains dblink, so it is able to execute queries in remote databases.
Using dblink
—————
It is now time to test it, executing some queries.
* Create a database to query, with an example table:


$ createdb my_other_db
$ psql -c "CREATE TABLE t AS SELECT generate_series(1,1000) AS v" my_other_db

* Create the dblink connection to that database:


$ psql my_db
my_db=# SELECT dblink_connect('myconn', 'dbname=my_other_db');
dblink_connect
-------------------
OK
(1 row)

* Execute queries using that connection:


$ psql my_db
my_db=# SELECT * FROM dblink( 'myconn', 'SELECT v FROM t')
AS t1 ( v INTEGER ) LIMIT 5;

Please note that you must specify the expected set of columns in the calling query, because dblink returns a set of record (to be generic).
Conclusions
—————
Even though dblink would not be my preferred choice (also because it is not directly supported by Greenplum – or for Greenplum), this article shows you a simple method to use it in those cases where you can’t really do without it.

2 replies
  1. Tamas
    Tamas says:

    As there is no AUTONOMOUS TRANSACTIONS in greenplum, as there are several exceptions what greenplum cannot catch, the only way to have logging inside gp functions are the dblink or the output file + external table methods 🙁

    Reply

Trackbacks & Pingbacks

  1. […] Carlo Ascani said in his blog,we can use dblink for postgresql in Greenplum(reference). In his env,the greenplum was likely installed in only ONE host, but in mine, Greenplum is […]

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *