2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Featured3 / Speed up getting WAL files from Barman
Gabriele Bartolini

Speed up getting WAL files from Barman

July 19, 2016/0 Comments/in Featured, Gabriele's PlanetPostgreSQL /by Gabriele Bartolini

Postgres---WAL

Starting from Barman 1.6.1, PostgreSQL standby servers can rely on an “infinite” basin of WAL files and finally pre-fetch batches of WAL files in parallel from Barman, speeding up the restoration process as well as making the disaster recovery solution more resilient as a whole.

The master, the backup and the standby

Before we start, let’s define our playground. We have our PostgreSQL primary server, called angus. A server with Barman, called barman and a third server with a reliable PostgreSQL standby, called chris – for different reasons, I had to rule out the following names bon, brian, malcolm, phil, cliff and obviously axl. 😉

angus is a high workload server and is continuously backed up on barman, while chris is a hot standby server with streaming replication from angus enabled. This is a very simple, robust and cheap business continuity cluster that you can easily create with pure open source PostgreSQL, yet capable of reaching over 99.99% uptime in a year (according to our experience with several customers at 2ndQuadrant).

What we are going to do is to instruct chris (the standby) to fetch WAL files from barman whenever streaming replication with angus is not working, as a fallback method, making the entire system more resilient and robust. Most typical examples of these problems are:

  1. temporary network failure between chris and angus;
  2. prolonged downtime for chris which causes the standby to go out of sync with angus.

For further information, please refer to the Getting WAL files from Barman with ‘get-wal’ blog article that I wrote some time ago.

Technically, we will be configuring the standby server chris to remotely fetch WAL files from barman as part of the restore_command option in the recovery.conf file. Since the release of Barman 1.6.1 we can take advantage of parallel pre-fetching of WAL files, which exploits network bandwidth and reduces recovery time of the standby.

Requirements

This scenario has been tested on Linux systems only, and requires:

  • Barman >= 1.6.1 on the barman server
  • Python with argparse module installed (available as a package for most Linux distributions) on chris
  • Public Ssh key of the postgres@chris user in the ~/.ssh/authorized_keys file of the barman@barman user (procedure known as exchange of Ssh public key)

Installation

As postgres user on chris download the script from our Github repository in your favourite directory (e.g. ~postgres/bin, or /var/lib/pgsql/bin directly) with:

cd ~postgres/bin
wget http://raw.githubusercontent.com/2ndquadrant-it/barman/master/scripts/barman-wal-restore
chmod +700 barman-wal-restore

Then verify it is working:

./barman-wal-restore -h

You will get this output message:

usage: barman-wal-restore [-h] [-V] [-U USER] [-s SECONDS] [-p JOBS] [-z]
                             [-j]
                             BARMAN_HOST SERVER_NAME WAL_NAME WAL_DEST

This script will be used as a 'restore_command' based on the get-wal feature
of Barman. A ssh connection will be opened to the Barman host.

positional arguments:
  BARMAN_HOST           The host of the Barman server.
  SERVER_NAME           The server name configured in Barman from which WALs
                        are taken.
  WAL_NAME              this parameter has to be the value of the '%f' keyword
                        (according to 'restore_command').
  WAL_DEST              this parameter has to be the value of the '%p' keyword
                        (according to 'restore_command').

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit
  -U USER, --user USER  The user used for the ssh connection to the Barman
                        server. Defaults to 'barman'.
  -s SECONDS, --sleep SECONDS
                        sleep for SECONDS after a failure of get-wal request.
                        Defaults to 0 (nowait).
  -p JOBS, --parallel JOBS
                        Specifies the number of files to peek and transfer in
                        parallel. Defaults to 0 (disabled).
  -z, --gzip            Transfer the WAL files compressed with gzip
  -j, --bzip2           Transfer the WAL files compressed with bzip2

If you get this output, the script has been installed correctly. Otherwise, you are most likely missing the argparse module in your system.

Configuration and setup

Locate the recovery.conf in chris and properly set the restore_command option:

restore_command = "/var/lib/pgsql/bin/barman-wal-restore -p 8 -s 10 barman angus %f %p"

The above example will connect to barman as barman user via Ssh and execute the get-wal command on the angus PostgreSQL server backed up in Barman. The script will pre-fetch up to 8 WAL files at a time and, by default, store them in a temporary folder (currently fixed: /var/tmp/barman-wal-restore).

In case of error, it will sleep for 10 seconds. Using the help page you can learn more about the available options and tune them in order to best fit in your environment.

Verification

All you have to do now is restart the standby server on chris and check from the PostgreSQL log that WALs are being fetched from Barman and restored:

Jul 15 15:57:21 chris postgres[30058]: [23-1] LOG:  restored log file "00000001000019EA0000008A" from archive

You can also peek in the /var/tmp/barman-wal-restore directory and verify that the script has been executed.

Even Barman logs contain traces of this activity.

Conclusions

This very simple Python script that we have written and is available under GNU GPL 3 makes the PostgreSQL cluster more resilient, thanks to the tight cooperation with Barman.

It not only provides a stable fallback method for WAL fetching, but it also protects PostgreSQL standby servers from the infamous 255 error returned by Ssh in the case of network problems – which is different than SIGTERM and therefore is treated as an exception by PostgreSQL, causing the recovery process to abort (see the “Archive Recovery Settings” section in the PostgreSQL documentation).

Stay tuned with us and with Barman’s development as we continue to improve disaster recovery solutions for PostgreSQL. We would like to thank our friends at Subito.it, Navionics and Jobrapido for helping us with the development of this important feature, as well as many others 2ndQuadrant customers who we cannot mention due to non disclosure agreements but still continue to support our work.

Side note: hopefully I won’t have to change the way I name servers due to AC/DC continuously changing their formation. 😉

Tags: Barman, barman-wal-restore, barman-wal-restore.py, Business Continuity, disaster recovery, get-wal, parallel, parallel get-wal, pgbarman, postgres, PostgreSQL, replication, restore_command, standby, wal hub
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Report from DatabaseCamp, NYC Evolution of Fault Tolerance in PostgreSQL: Time Travel
Scroll to top
×