2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / Near-Zero Downtime Automated Upgrades of PostgreSQL Clusters in Cloud (Part...
Gulcin Yildirim

Near-Zero Downtime Automated Upgrades of PostgreSQL Clusters in Cloud (Part I)

March 23, 2018/5 Comments/in 2ndQuadrant, DevOps, Featured, Gulcin's PlanetPostgreSQL, pglogical, PostgreSQL /by Gulcin Yildirim

Last week, I was at Nordic PGDay 2018 and I had quite a few conversations about the tool that I wrote, namely pglupgrade, to automate PostgreSQL major version upgrades in a replication cluster setup. I was quite happy that it has been heard and some other people in different communities giving talks at meetups and other conferences about near-zero downtime upgrades using logical replication. Given that there is a talk that I gave at PGDAY’17 Russia, PGConf.EU 2017 in Warsaw and lastly at FOSDEM PGDay 2018 in Brussels, I thought it is better to create a blog post to keep this presentation available to the folks who could not make it to any of the conferences aforementioned. If you would like to directly go the talk and skip reading this blog post here is your link: Near-Zero Downtime Automated Upgrades of PostgreSQL Clusters in Cloud

Near-Zero Downtime Automated Upgrades of PostgreSQL Clusters in Cloud

The main motivation behind developing this tool was to propose a solution to minimize the downtime during major version upgrades which unfortunately affect almost everyone who uses PostgreSQL. Currently, we don’t have a tool that allows PostgreSQL users to upgrade their databases without downtime and it is clearly a pain point for many users, especially businesses. And, if we are to solve the upgrade problem we should think of more than one server (will be referred as a cluster from now on), simply because not many people use only one database server anymore. The most common scenario is having physical streaming replication setup either for high-availability purposes or scaling the read queries.

Database Upgrades

Before diving into the solution, let’s discuss a bit about how database upgrades work in general. There are four main possible approaches to database upgrades:

  1. The first approach would be for databases to keep their storage format same or at least compatible across versions. However, this is hard to guarantee long-term as new features might require changes in how data is stored or add more metadata information to work properly. Also, performance is often improved by optimizing the data structures.
  2. The second approach is to make a logical copy (dump) of the old server and loading it into the new server. This is the most traditional approach which requires the old server to not receive any updates during the process and results in prolonged downtimes of hours or even days on large databases.
  3. The third option is to convert data from old format to new one. This can either be done on the fly while the new system is running, but incurs performance penalty which is hard to predict as it depends on data access patterns, or it can be done offline while the servers are down, again incurring prolonged downtimes (although often shorter than the second method).
  4. The fourth method is to use logical dump for saving and restoring the database while capturing the changes happening in meantime and logically replicating them to the new database once the initial restore has finished. This method requires orchestration of several components but also decreases the amount of time the database cannot respond to queries.

Upgrade Methods in PostgreSQL

PostgreSQL upgrades can be matched with the four methods listed above. For example, PostgreSQL minor releases, which do not contain new features but only fixes, do not change the existing data format and fits the first approach. For the second approach, PostgreSQL provides tools called pg_dump and pg_restore which do the logical backup and restore. There is also pg_upgrade tool (it was a contrib module and moved to the main tree in PostgreSQL 9.5) which does offline (the servers are not running) conversion of the old data directory to the new one, which can fit into the third option. For the fourth approach, there are third party trigger-based solutions such as Slony for upgrading, but they have some caveats which we will cover later.

Traditional methods of upgrade can only upgrade the primary server while the replica servers have to be rebuilt afterward (offline conversion). This leads to additional problems with both cluster availability and capacity, hence effectively increasing the perceived downtime of the database from the point of both applications and users. Logical replication allows replicating while the system is up-and-running and testing effort can be handled in the meantime (online conversion).

There are different upgrade methods applicable for different environments. For example, pg_dump allows upgrade from very old versions (i.e 7.0) to recent releases, so if you are running an antique version of PostgreSQL, the best method is using pg_dump/restore (and better to do it now!). If your PostgreSQL version is 8.4 and later you can use pg_upgrade. If you happen to be using PostgreSQL 9.4 and later this means you have in-core “Logical Decoding” and you can use the pglogical extension as means of upgrade. Finally, if you are using PostgreSQL 10, you will have a chance to upgrade your database to PostgreSQL 11 and later (once they are available) by using in-core “Logical Replication” (yay!).

Logical Replication for Upgrades

Where is the idea of upgrading PostgreSQL by using logical replication coming from?Clearly, pglogical does not claim the upgrade purpose openly like pg_upgrade does (this was one argument against pglogical at the conference party), but this does not mean that we cannot use the tool for upgrades. In fact, when pglogical was designed, near-zero downtime upgrades was considered as one of the expected use-cases by the developers of the extension.

Unlike physical replication, which captures changes to the raw data on disk, the logical replication captures the logical changes to the individual records in the database and replicates those. The logical records work across major releases, hence logical replication can be used to upgrade from one release to another. The idea of using logical replication as means of version upgrade is far from being new, triggered-based replication tools have been doing upgrades in a logical way by capturing and sending the changes via triggers (because triggers can work regardless of database versions, too!).

Note: You can check my Evolution of Fault Tolerance in PostgreSQL presentation about how replication works in PostgreSQL.

If you can use trigger-based replication solutions for minimum downtime upgrades why should you prefer logical replication instead? In a trigger-based mechanism, all the changes are captured by using triggers and written into queue tables. This procedure doubles the write operations, doubles log files, and slows down the system since all the changes have to be written twice; resulting in more disk I/O and load on the source server. In contrast, in-core logical replication (same goes for pglogical extension) is built on top of logical decoding. Logical decoding is a mechanism that extracts information from WAL files into logical changes (INSERT/UPDATE/DELETE). Since the data from WAL mechanism is used by decoding the transaction logs, there is no write amplification as in the case of trigger-based replication solutions, hence this method performs better. As a result, logical decoding based solutions simply have a lower impact on the system than trigger-based solutions.

Conclusion

In this introductory post, I wanted to give an idea why we should consider using logical replication to achieve minimal downtime during major version upgrades. We will continue with the details of the architecture and implementation of the tool in the upcoming post.

Tags: Ansible, AWS, cloud, cloud computing, database upgrade, in-core-logical-replication, logical decoding, logical replication, pglogical, pglupgrade, postgresql replication, postgresql upgrade, PostgreSQL10, software automation, upgrade, version upgrade
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
5 replies
  1. Bruce Momjian
    Bruce Momjian says:
    March 23, 2018 at 3:55 pm

    You mentioned pg_upgrade as a contrib module, but it was moved from /contrib to the main tree in Postgres 9.5.

    Reply
    • Gulcin Yildirim
      Gulcin Yildirim says:
      April 6, 2018 at 11:47 am

      Thanks, Bruce. I’ve updated the post accordingly.

      Reply
  2. Eugene Klimov
    Eugene Klimov says:
    April 28, 2018 at 7:43 pm

    Would you share pglupgrade ansible roles on github or somephing else?

    Why pglogical lose his state after run pg_upgrade and install postgresql-10 over postgresql-9.5 ?

    Reply
    • Gulcin Yildirim
      Gulcin Yildirim says:
      April 30, 2018 at 11:34 am

      Hi Eugene,
      We are using the tool in 2ndQuadrant and unfortunately, I can’t provide the code to the public yet. That’s why I am writing how it works so that you can build your own tooling. You can find more details in the second part of the series which will be published this week.
      For the second question, I did not quite get it, would you explain the problem better, please?
      Thanks.

      Reply
    • craig.ringer
      craig.ringer says:
      May 2, 2018 at 4:04 am

      pg_upgrade does not preserve replication slots or replication origins, which are the position tracking facilities used to maintain replication state in pglogical and other logical replication tools. It wouldn’t make sense to preserve them because they track positions based on log sequence number (LSN), which is the replication log position. When pg_upgrade creates a new cluster it makes a new replication log that (re)starts at 0, rendering these positions meaningless.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Prague PostgreSQL Meetup Don’t be hard-headed… Harden your PostgreSQL database to ensure...
Scroll to top
×