Upgrading to PostgreSQL 11 with Logical Replication

It’s time.

About a year ago, we published PostgreSQL 10 with support for native logical replication. One of the uses of logical replication is to allow low- or no-downtime upgrading between PostgreSQL major versions. Until now, PostgreSQL 10 was the only PostgreSQL release with native logical replication, so there weren’t many opportunities for upgrading in this way. (Logical replication can also be used for moving data between instances on different operating systems or CPU architectures or with different low-level configuration settings such as block size or locale — sidegrading if you will.) Now that PostgreSQL 11 is near, there will be more reasons to make use of this functionality.

Let’s first compare the three main ways to upgrade a PostgreSQL installation:

  • pg_dump and restore
  • pg_upgrade
  • logical replication

We can compare these methods in terms of robustness, speed, required downtime, and restrictions (and more, but we have to stop somewhere for this article).

pg_dump and restore is arguably the most robust method, since it’s the most tested and has been in use for decades. It also has very few restrictions in terms of what it can handle. It is possible to construct databases that cannot be dumped and restored, mostly involving particular object dependency relationships, but those are rare and usually involve discouraged practices.

The problem with the dump and restore method is of course that it effectively requires downtime for the whole time the dump and restore operations run. While the source database is still readable and writable while the process runs, any updates to the source database after the start of the dump will be lost.

pg_upgrade improves on the pg_dump process by moving over the data files directly without having to dump them out into a logical textual form. Note that pg_upgrade still uses pg_dump internally to copy the schema, but not the data. When pg_upgrade was new, its robustness was questioned, and it did upgrade some databases incorrectly. But pg_upgrade is now quite mature and well tested, so one does not need to hesitate about using it for that reason anymore. While pg_upgrade runs, the database system is down. But one can make a choice about how long pg_upgrade runs. In the default copy mode, the total run time is composed of the time to dump and restore the schema (which is usually very fast, unless one has thousands of tables or other objects) plus the time to copy the data files, which depends on how big the database is (and the I/O system, file system, etc.).

In the optional link mode, the data files are instead hard-linked to the new data directory, so that the time is merely the time to perform a short kernel operation per file instead of copying every byte. The drawback is that if anything goes wrong with the upgrade or you need to fall back to the old installation, this operation will have destroyed your old database. (I’m working on a best-of-both-worlds solution for PostgreSQL 12 using reflinks or file clone operations on supported file systems.)

Logical replication is the newest of the bunch here, so it will probably take some time to work out the kinks. If you don’t have time to explore and investigate, this might not be the way to go right now. (Of course, people have been using other non-core logical replication solutions such as Slony, Londiste, and pglogical for upgrading PostgreSQL for many years, so there is a lot of experience with the principles, if not with the particulars.)

The advantage of using logical replication to upgrade is that the application can continue to run against the old instance while the data synchronization happens. There only needs to be a small outage while the client connections are switched over. So while an upgrade using logical replication is probably slower start to end than using pg_upgrade in copy mode (and definitely slower than using hardlink mode), it doesn’t matter very much since the actual downtime can be much shorter.

Note that logical replication currently doesn’t replicate schema changes. In this proposed upgrade procedure, the schema is still copied over via pg_dump, but subsequent schema changes are not carried over. Upgrading with logical replication also has a few other restrictions. Certain operations are not captured by logical replication: large objects, TRUNCATE, sequence changes. We will discuss workarounds for these issues later.

If you have any physical standbys (and if not, why don’t you?), there are also some differences to consider between the methods. With either method, you need to build new physical standbys for the upgraded instance. With dump and restore as well as with logical replication, they can be put in place before the upgrade starts so that the standby will be mostly ready once the restore or logical replication initial sync is complete, subject to replication delay.

With pg_upgrade, the new standbys have to be created after the upgrade of the primary is complete. (The pg_upgrade documentation describes this in further detail.) If you rely on physical standbys for high-availability, the standbys ought to be in place before you switch to the new instance, so the setup of the standbys could affect your overall timing calculations.

But back to logical replication. Here is how upgrading with logical replication can be done:

0. The old instance must be prepared for logical replication. This requires some configurations settings as described under http://www.postgresql.org/docs/10/static/logical-replication-config.html (mainly wal_level = logical. If it turns out you need to make those changes, they will require a server restart. So check this well ahead of time. Also check that pg_hba.conf on the old instance is set up to accept connections from the new instance. (Changing that only requires a reload.)

1. Install the new PostgreSQL version. You need at least the server package and the client package that contains pg_dump. Many packagings now allow installing multiple versions side by side. If you are running virtual machines or cloud instances, it’s worth considering installing the new instance on a new host.

2. Set up a new instance, that is, run initdb. The new instance can have different settings than the old one, for example locale, WAL segment size, or checksumming. (Why not use this opportunity to turn on data checksums?)

3. Before you start the new instance, you might need to change some configuration settings. If the instance runs on the same host as the old instance, you need to set a different port number. Also, carry over any custom changes you have made in postgresql.conf on your old instance, such as memory settings, max_connections, etc. Similarly, make pg_hba.conf settings appropriate to your environment. You can usually start by copying over the pg_hba.conf file from the old instance. If you want to use SSL, set that up now.

4. Start the new (empty) instance and check that it works to your satisfaction. If you set up the new instance on a new host, check at this point that you can make a database connection (using psql) from the new host to the old database instance. We will need that in the subsequent steps.

5. Copy over the schema definitions with pg_dumpall. (Or you can do it with pg_dump for each database separately, but then don’t forget global objects such as roles.)

pg_dumpall -s >schemadump.sql
psql -d postgres -f schemadump.sql

Any schema changes after this point will not be migrated. You would have to manage those yourself. In many cases, you can just apply the changing DDL on both hosts, but running commands that change the table structure during an upgrade is probably a challenge too far.

6. In each database in the source instance, create a publication that captures all tables:

CREATE PUBLICATION p_upgrade FOR ALL TABLES;

Logical replication works separately in each database, so this needs to be repeated in each database. On the other hand, you don’t have to upgrade all databases at once, so you can do this one database at a time or even not upgrade some databases.

7. In each database in the target instance, create a subscription that subscribes to the just-created publication. Be sure to match the source and target databases correctly.

CREATE SUBSCRIPTION s_upgrade CONNECTION 'host=oldhost port=oldport dbname=dbname ...' PUBLICATION p_upgrade;

Set the connection parameters as appropriate.

8. Now you wait until the subscriptions have copied over the initial data and have fully caught up with the publisher. You can check the initial sync status of each table in a subscription in the system catalog pg_subscription_rel (look for r = ready in column srsubstate). The overall status of the replication can be checked in pg_stat_replication on the sending side and pg_stat_subscription on the receiving side.

9. As mentioned above, sequence changes are not replicated. One possible workaround for this is to copy over the sequence values using pg_dump. You can get a dump of the current sequent values using something like this:

pg_dump -d dbname --data-only -t '*_seq' >seq-data.sql

(This assumes that the sequence names all match *_seq and no tables match that name. In more complicated cases you could also go the route of creating a full dump and extracing the sequence data from the dump’s table of contents.)

Since the sequences might advance as you do this, perhaps munge the seq-data.sql file to add a bit of slack to the numbers.

Then restore that file to the new database using psql.

10. Showtime: Switch the applications to the new instances. This requires some thinking ahead of time. In the simplest scenario, you stop your application programs, change the connection settings, restart. If you use a connection proxy, you can switch over the connection there. You can also switch client applications one by one, perhaps to test things out a bit or ease the load on the new system. This will work as long as the applications still pointing to the old server and those pointing to the new server don’t make conflicting writes. (In that case you would be running a multimaster system, at least for a short time, and that is another order of complexity.)

11. When the upgrade is complete, you can tear down the replication setup. In each database on the new instance, run

DROP SUBSCRIPTION s_upgrade;

If you have already shut down the old instance, this will fail because it won’t be able to reach the remote server to drop the replication slot. See the DROP SUBSCRIPTION man page for how to proceed in this situation.

You can also drop the publications on the source instance, but that is not necessary since a publication does not retain an resources.

12. Finally, remove the old instances if you don’t need them any longer.

Some additional comments on workarounds for things that logical replication does not support. If you are using large objects, you can move them over using pg_dump, of course as long as they don’t change during the upgrade process. This is a significant limitation, so if you are a heavy user of large objects, then this method might not be for you. If your application issues TRUNCATE during the upgrade process, those actions will not be replicated. Perhaps you can tweak your application to prevent it from doing that for the time of the upgrade, or you can substitute a DELETE instead. PostgreSQL 11 will support replicating TRUNCATE, but that will only work if both the source and the destination instance are PostgreSQL 11 or newer.

Some closing comments that really apply to all upgrade undertakings:

  • Applications and all database client programs should be tested against a new major PostgreSQL version before being put into production.
  • To that end, you should also test the upgrade procedure before executing it in the production environment.
  • Write things down or better script and automate as much as possible.
  • Make sure your backup setup, monitoring systems, and any maintenance tools and scripts are adjusted appropriately during the upgrade procedure. Ideally, these should be in place and verified before the switchover is done.

With that in mind, good luck and please share your experiences.

5 replies
  1. Bruce Momjian
    Bruce Momjian says:

    Just to clarify, pg_upgrade modifies the old cluster only if link mode is used _and_ you start the new server. If pg_upgrade fails while it is running, it does not modify the old server, even in link mode.

    Reply
  2. Sergei Lavrov
    Sergei Lavrov says:

    “With pg_upgrade, the new standbys have to be created after the upgrade of the primary is complete.” – you can convert streaming replication standys with pg_upgrade too. It’s not necessary to recreate standbys from scratch. So downtime will be similar as you pg_update master.

    Reply
    • Peter Eisentraut
      Peter Eisentraut says:

      The documentation for this states that the standbys are to be upgraded after the main part of the upgrade of the primary is done. So it will take a bit longer than just doing the primary. It depends on the individual circumstances, of course.

      Reply
  3. Telford Tendys
    Telford Tendys says:

    I tested an upgrade with 8.1G of data in /var/lib/pgsql/10/data/base/ and two versions of PostgreSQL running on the same server, different ports (postgresql10-server-10.6-1PGDG.rhel7.x86_64 upgrading to postgresql11-server-11.1-1PGDG.rhel7.x86_64). Using the simplistic command “pg_dumpall | psql –port=5444” gets the job done in less than 30 minutes. As you already mention above, this requires application downtime since changes made during the dump cannot reliably be synchronized afterwards.

    I also tested using the logical replication PUBLICATION / SUBSCRIPTION method, getting only the schema from “pg_dumpall -s” and found this method was working, but very slow. After 24 hours running (on exactly the same server as previous test) with intense I/O activity, some of the data was visible on the receiver but a lot was still missing and I gave up and shut down the test. What’s more the size of /var/lib/pgsql/11/data/base/ had been growing, it had reached more than 21G in size. Although the source server was still operational, trying to use our application was very sluggish during replication.

    Note that pg_dumpall is smart enough to move all the CREATE INDEX commands to after row data has been copied, while your logical replication method described above attempts to first install the full schema (including CREATE INDEX) and then afterwards push through the row data. I’m not sure that’s the only source of inefficiency but most people avoid doing it that way. It would be possible to hand edit the schema as a work around, or if all else fails use perl.

    The best solution I can imagine would be if the pg_dumpall established a checkpoint and simultaneously created a named slot aligned with that checkpoint. After that, it would dump everything up to the checkpoint and then walk away, leaving the slot sitting there ready to use for logical replication. On the new database you would use CREATE SUBSCRIPTION and grab the existing slot, bringing the replication up to date. Look at the way pg_basebackup works when doing a physical level replication onto a secondary server and then provide equivalent features in the pg_dumpall to get the equivalent results happening at the logical level replication. That’s my suggestion.

    Reply
  4. Steven Winfield
    Steven Winfield says:

    Thanks for this post. I’ve just been through an upgrade of our main production servers using this method and thought I’d share some findings. The cluster that was upgraded is around 1TB.

    1. In the initial loading phase there’s no real harm in setting “fsync” and “synchronous_commit” to false for a bit of a speed up, so long as you’re willing to run the risk of having to start from scratch should you suffer a power outage.

    2. If you want to change the locale of your cluster then you will need to edit the “CREATE DATABASE” statements in the sql dumps made in step #5 above, even if you are specifying “–locale” to initdb in step 2.

    3. For another (potentially huge) speed up: don’t create any unnecessary constraints or indexes before firing up logical replication – all you need are the correct columns (i.e. CREATE TABLE statements) and primary keys to be in place. Luckily this can be done with pg_dump(all) and grep:

    a. Use “pg_dumpall –globals-only > globals.sql” to dump global objects.
    b. Use “pg_dump –create –schema-only –section=pre-data -d dbname > dbname_predata.sql” for each database to dump just the table (and custom type) definitions.
    c. Use “pg_dump –schema-only –section=post-data -d dbname > dbname_postdata.sql” for each database to dump the constraints, triggers, and indexes.
    d. Use “grep -B 1 ‘PRIMARY KEY’ dbname_postdata.sql > dbname_pkeys.sql” to extract the primary key definitions from the *_postdata.sql files.
    e. In the new cluster execute globals.sql, all dbname_predata.sql, and all dbname_pkeys.sql – the latter will require switching to the correct database.

    For reference, only the table definitions are needed for the “initialization” phase for each table (“i” in pg_subscription_rel, when the backend is executing “COPY FROM table TO STDOUT”), but without the primary key you will get replication errors about the new incoming rows having no identity when the COPY finishes and real-time replication begins.

    4. After all data has been copied you can then run the *_postdata.sql scripts via psql. Don’t worry about the duplicate primary key definitions – psql will print an error message but will happily carry on with the rest of the DDL. You can run the scripts for all dbs at the same time, but for a given db the indexes, constraints, and triggers will be built sequentially – thankfully, CREATE INDEX has been parallelized in v11 so you may want to tweak your parallel worker settings to make best use of this.

    5. If your original cluster is doing binary replication to a slave – especially one in a different geographical location – it’s a good idea to set up binary replication from your new v11 instance to one in the other location too, and make sure they are connected before starting the logical replication from you v10 instance to your v11.

    6. As mentioned in the post, sequences are not replicated. Be aware that the command – as given – will only dump the contents of sequences that match *_id_seq **in the public schema**. Using ” -t ‘*.*_seq’ ” will match all schemas. Alternatively, this command should copy across the values of all sequences, no matter their name:

    For each database execute:
    psql -h old_host -p old_port -d old_db -t -c “select ‘SELECT pg_catalog.setval(”’ || quote_ident(schemaname) || ‘.’ || quote_ident(sequencename) || ”’, ‘ || coalesce(last_value, 1) || ‘, ‘ || case when last_value is null then ‘false);’ else ‘true);’ end from pg_sequences” | psql -h new_host -p new_port -d new_db -a

    If you want to add some slack to the sequence values then change this to “coalesce(last_value + N, N)” where N is, say, 1000.

    And that’s it. The only issues we had afterwards was a custom logical replication plugin that failed to initialize when I began adding replication slots to the new db (complaints of a missing symbol “AllocSetContextCreate”), which wasn’t caught during testing.

    Hope this helps someone!

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *