2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Craig's PlanetPostgreSQL3 / Failover slots for PostgreSQL
Failover slots for PostgreSQL
craig.ringer

Failover slots for PostgreSQL

June 8, 2020/8 Comments/in Craig's PlanetPostgreSQL /by craig.ringer

Logical decoding and logical replication is getting more attention in the PostgreSQL world. This means we need it working well alongside production HA systems – and it turns out there’s a problem there. Replication slots are not themselves synced to physical replicas so you can’t continue to use a slot after a master failure results in promotion of a standby.

The failover slots patch changes that, syncing slot creation and updates to physical replica servers such as those maintained with WAL archives or streaming replication. That lets logical decoding clients seamlessly follow a failover promotion and continue replay without losing consistency.

Logical decoding and slots

Introduced in 9.4, logical decoding lets a client stream changes row-by-row in consistent transaction commit order to some receiver – which can be another PostgreSQL instance or your choice of applications like message queues and search engines.

To stream the row data the client connects to a replication slot on the server. The slot makes sure that the server retains the WAL needed for decoding and (for logical slots) also prevents the removal of old versions of system catalog rows that might be needed to understand that WAL.

The failover problem

Most production PostgreSQL deployments rely on streaming replication and/or WAL archive based replication ("physical replication") as part of their high availability and failover capabilities. Unlike most server state, replication slots are not replicated from a master server to its replicas. When the master fails and a replica is promoted any replication slots from the master are missing. Logical replication clients cannot continue because they have no slot to connect to.

Easy, right? Just create a slot with the same name on the replica.

It’s not that simple. A logical replication slot can only be created with a view of the database history somewhere in the future relative to when it’s created. So if the client wasn’t totally up to date replaying from the slot on the old master, or if it doesn’t create a replacement slot on the new master as the very first thing done on the master, the client will miss out on changes. That’s what slots are meant to prevent and can be a critical problem for some applications.

The two key reasons a slot can’t be created "back in time" are WAL retention and – for logical slots – vacuuming of the system catalogs. We also can’t get a snapshot of the past to export, but that’s only a problem for new client setup. WAL retention is the simplest: the standby might throw away WAL segments corresponding to database changes that a logical decoding client on the master hasn’t yet replayed. If there’s a failover then there’s no way to ever replay those changes to that client. The other issue is catalogs – logical decoding needs the historical definition of tables, types, etc to interpret data in WAL, and to do that it does a sort of MVCC-based time travel using deleted rows. If VACUUM on the master removes those deleted rows and marks the space free for re-use then the standby will replay that change and we can’t make sense of what’s in WAL anymore.

Failover slots

Failover slots address these issues by synchronizing slot creation, deletion and position updates to replica servers. This is done through the WAL stream like everything else. If a slot is created as a failover slot using the new failover boolean option to pg_create_logical_replication_slot then its creation is logged in WAL. So are subsequent position updates when the client tells the server it can free no-longer-needed resources.

If the master fails and a standby is promoted, logical decoding clients can just reconnect to the standby and carry on as if nothing had happened. If a DNS update or IP swap is done along with the promotion the logical decoding client might not even notice it’s anything more than a master restart.

What about physical slots?

PostgreSQL’s block-level ("physical") replication also has replication slots. They can be used to pin WAL retention, providing a finer-grained mechanism than wal_keep_segments at the cost of also being unbounded.

A physical failover slot can be created, just like a logical failover slot.

Does this mean pglogical can be used to fail over to a logical replica?

Failover slots do not aid logical replication solutions in supporting failover to a logical replica. They exist to allow logical replication to follow a physical failover.

Supporting failover to a logical replica is a completely unrelated matter. There are a number of limitations in PostgreSQL core that are relevant to it, like the currently missing support for logical decoding of sequence position advances. Failover slots will neither help nor hinder there. What they do is provide a way to integrate logical replication into HA solutions now, into existing mature and established infrastructure patterns.

Tags: failover slots, high availability, logical decoding, logical replication, pglogical, replication, replication slots, slots
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
8 replies
  1. Alex
    Alex says:
    June 8, 2020 at 9:51 pm

    Which PG version supports failover slots?

    Reply
  2. Egor
    Egor says:
    June 9, 2020 at 3:36 pm

    Hi Craig,
    how the patch is going on? Looks like there was nothing since 2016’s https://commitfest.postgresql.org/9/488/?

    Reply
  3. Avi
    Avi says:
    June 15, 2020 at 10:26 am

    Any idea as to what release (upcoming releases ?) is this patch expected to be applied to ?
    It is really great to see you working on it.

    Reply
    • craig.ringer
      craig.ringer says:
      May 7, 2021 at 3:29 am

      See https://wiki.postgresql.org/wiki/Failover_slots and https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover

      This feature will not be arriving in PostgreSQL in its current form.

      Reply
  4. Purav
    Purav says:
    June 23, 2020 at 6:31 pm

    Hi Craig,

    Thanks and appreciate all your efforts to educate tons of users around the world.

    Can you please share the version of pglogical in which this feature (failover slot) was introduced?

    Regards

    Reply
    • craig.ringer
      craig.ringer says:
      May 7, 2021 at 3:28 am

      Failover slots was not merged into PostgreSQL.

      See https://wiki.postgresql.org/wiki/Failover_slots where I updated the relevant info.

      See also https://wiki.postgresql.org/wiki/Logical_replication_and_physical_standby_failover

      Apologies it took me so long to reply. I’ve been very busy and didn’t get notified of blog comments.

      Reply
  5. Max Vialon
    Max Vialon says:
    June 26, 2020 at 8:19 am

    I am currently looking into how to restart a replication based on the logical decoding feature after a fallover in a cluster.
    Those Failover slots sound like they would solve it but I can’t find any documentation on the failover slot patch you mentioned in the first paragraph.

    Would you have any link for me on where to find those patchnotes so I can create such a failover replication slot and test with it?

    Best regards
    Max

    Reply
  6. pcpg
    pcpg says:
    July 8, 2020 at 8:02 am

    Hi Craig,

    I had posted a query a few days back, I dont see that yet. Also, although this article is dated June 2020, I see this article was originally published sometime in 2016. Is that correct?

    Basically, I am looking for info on whether failover slots are supported or no.

    Thanks

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up]
  • Full-text search since PostgreSQL 8.3
  • Random numbers
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up]

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
How to deploy a multi-master BDR cluster in Kubernetes How to deploy a multi-master BDR cluster in Kubernetes RESTful CRUD API using PostgreSQL and Spring Boot – Part one RESTful CRUD API using PostgreSQL and Spring Boot – Part one
Scroll to top
×