Easier PostgreSQL 9.0 clusters with repmgr



When PostgreSQL 9.0 shipped a few
months ago, it included several new replication features. It’s
obvious that you can use these features to build clusters of servers
for both high availability and read query scaling purposes. What
hasn’t been so obvious is how to manage that cluster easily. Getting
a number of nodes installed and synchronized with their master isn’t
that difficult. But while the basic functions necessary to monitor
multiple nodes and help make decisions like “which node do I
promote if the master fails?” were included in 9.0, the way they
expose this information is based on internal server units. There are
a few common complaints that always seem to show up once you actually
consider putting one of these clusters into a production environment:

  • How do I handle adding new nodes easily
    to expand capacity later?
  • Can I monitor replication lag in time
    units?
  • When the master fails, how do I find
    the right node to promote, then switch all of the other standby
    servers to follow it?

Solving these problems inside the
database itself isn’t necessarily the right way to proceed.
PostgreSQL has a clear distinction that some things belong on the
core server code, while others don’t. Where that really starts to
break down here is the node fail-over case. Handling that properly
requires coordinating actions on multiple nodes at once. If you look
at this problem long enough, you’ll realize that what works best here
is a background daemon that aids in monitoring and node state
changes. It can stay in the background all of the time, and respond
to requests from other nodes to coordinate multi-node actions. Since
by definition part of that daemon’s job will require operating when
there is no working database on the server yet, the idea of
integrating that job into the PostgreSQL core is difficult to
achieve. Also, people deploying databases tend to be risk-adverse
about the database code itself, preferring to deploy older, tested
releases rather than the latest one. Given the about yearly release
rate of the core PostgreSQL code, an external program can evolve much
more quickly. Since such a program would be operating using the
standard user APIs to the database, changes to it shouldn’t put your
data at risk the way touching the core server code does.

With all this in mind, before
PostgreSQL 9.0 was even released 2ndQuadrant started an internal
project to handle all of these tasks, which we’ve now released as a
program named repmgr. It provides simpler user interfaces to basic
setup of a multi-node cluster. The data needed to monitoring lag in
business appropriate ways is recorded. And even complicated node
transitions can be handled all from one system, with its always
running repmgrd daemon process handling communication to the others.

repmgr has been in testing internally
and at customer beta sites for months within 2ndQuadrant, and the
first external release of the code came out in December. We’ve just
been waiting for some early broader testing before the sort of
promotion you’re reading right now. That’s all been going well so
far, and work is moving toward a 1.01 release later this month that
clears up the main issues found by our early adopters. The primary
project page is http://projects.2ndquadrant.com/repmgr
and you can find documentation inside the source code repository
active development is happening in, which is currently my GitHub
repository at http://github.com/greg2ndQuadrant/repmgr
We also have a Google Groups area you can use as a support forum or
a mailing list, as you prefer, for discussing the software.

In addition to it being external code,
the other controversial aspect of the repmgr release has been that
it’s licensed under the GPL v3, rather than the BSD license used for
PostgreSQL itself. We’ve gotten criticism that we’re trying to
emulate the mixed commercial/open license scheme seen in other
databases such as MySQL, a model reviled by much of the PostgreSQL
community. This is completely backwards from the reality.

The terms under which repmgr were
developed required that we release it as free software, and that it
remains such. There is no special proprietary version we charge for.
The code that’s shared on GitHub is a snapshot of our whole
development repo going back to the first commit, warts and all; the
only thing we’re not doing is releasing some internal development
branches until they work. There are no commercial restrictions on
using the program. The only restriction being enforced by the use of
the GPL here is that we expect code changes made to the program to be
shared with the world. We want this to be free software in the
spirit that phrase is used by the Free Software Foundation: if you
find the program useful, and decide to enhance it, you should share
those enhancements with the world. In fact, the way we are handling
copyright issues around the code was modeled carefully on the FSF
requirements for submission to the GNU software chain. The main
purpose of the copyright assignment we’ve asked contributors to do is
not so we can have our own special private build. It’s to make sure
that someone else hasn’t added non-free software to our project. We
don’t want contributions we merge to end up limiting the ability of
others to rely upon repmgr for their projects, by making it less free
software for having accepted it.

repmgr is being actively developed by
many members of 2ndQuadrant, and represents a major community project
we intend to keep advancing. It seems appropriate that a project
that is already finding itself being put to use replicating databases
across national boundaries was developed that way, too. The initial
design concepts and architecture came from Simon Riggs in the UK.
The bulk of the coding so far was done by Jaime Casanova in Ecuador.
Myself and Robert J. Noles here in the US did the initial
documentation and testing. Our second user of repmgr for a
production customer, Gabriele Bartolini in Italy, has been sending
back a steady stream of bug fixes and feature improvements due to
arrive in the next release. If you work with PostgreSQL, you should
recognize some of the names on that list. Ask yourself which sounds
more likely: that all of us who have staked our careers on the
success of a free PostgreSQL have simultaneously turned away from
that philosophy because of our devious corporate interests; or that
we’re releasing a free software tool we intend to build an open
community around.

You can get repmgr in the morning and
be building multi-node clusters with it by the end of the day. We
hope you do just that, and consider joining the user and development
community we’re building around it. While not all may agree with
every decision about repmgr we’ve made, don’t forget the most
important thing: everyone who successfully deploys replicated
PostgreSQL is another person we’ve helped save from wasting money on
Oracle RAC. And isn’t that what’s really important?

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *