What’s new about Barman 1.4.0?
The 1.4.0 version of Barman adds new features such as incremental backup and automatic integration with pg_stat_archiver
which aim to simplify the life of DBAs and system administrators.
Barman 1.4.0: the most important changes
The latest release introduces a new backup mode, the inc remental backup. This mode allows the reuse of unmodified files between one periodic backup and another, drastically reducing the execution times, bandwidth used and disk space taken up. Another new feature is the integration of Barman with the view pg_stat_archiver
, available from version 9.4 of PostgreSQL. The view allows information on the performance of the WAL storage to be collected and to monitor the status of the process. Management of the WAL files has been improved. Calculation of storage statistics has been streamlined and optimised. The logic of the removal of obsolete WAL has been improved, performing different actions in the event of exclusive or concurrent backups. Error messages have been improved, making them clearer and more legible where possible. We have also invested in the robustness of the code: with the 1.4.0 release we have approximately 200 unit tests that are performed with every patch.
Incremental backup
Let’s explore the main innovation of this release: the incremental backup.
Definition and basic theory
To understand the logic on which the incremental backup is based, let’s consider two complete and consecutive backups. In the time interval between completion of the first backup and completion of the subsequent backup, not all the files contained within the PGDATA
directory are modified. A number of files of the oldest and of the most recent backup are identical and are therefore redundant, requiring time and bandwidth to be transferred via the network and taking up unnecessary space on the disk after copying. If we compare the files of the oldest backup with the files that we are going to copy from the remote server, it is possible to distinguish the set of files that has been modified from the files which have remained unchanged. With the incremental backup it thus becomes possible to eliminate redundancy, copying only the modified files.
Implementation and tangible benefits
We developed this feature by setting ourselves three objectives:
- reduction of backup execution time;
- reduction of bandwidth usage;
- reduction of space taken up, accomplished by eliminating redundancies (deduplication).
To achieve this, we exploited the capacity of Rsync to compare a list of files received from a remote server with the content of a local directory, identifying which had been modified. We thus added a new option for server/global configuration called reuse_backup
. This option identifies the type of backup to be performed. Let’s look at the three possible values of ‘reuse_backup’ and their effects:
- off: default value, classic backup;
- copy: identifies on the remote server the modified files using the last backup performed as a basis. Only the files that have changed are transferred over the network, reducing the execution time of a backup and saving bandwidth. At the end of the transfer the unmodified files are copied, thus creating a full backup;
- link: identifies the modified files and makes a copy of them, exactly like the
copy
option. At the end of the transfer, the reuse of the unmodified files is obtained using hard links instead of copying the files. This optimisation of disk space occupied by the backup removes any redundancy (deduplication).
It is also possible to use the option --reuse-backup [{copy, link, off}]
from the command line to change the default behaviour for an individual backup. For example:
gt; barman backup --reuse-backup link main
|
…will force reuse of the backup, using hard links regardless of the value set within the configuration file.
I will now use Navionics as a case study, one of our customers and the sponsor of this release which, as we shall see, gained strong advantages from the use of the incremental backup. Navionics has very large databases (one of the largest is approximately 13 TiB). Before the introduction of the incremental backup, taking into account the characteristics of the server and network:
- approximately 52 hours would have been needed to complete a backup;
- 13 TiB of data would actually have been copied through the network;
- 13 TiB would actually have been taken up on the disk.
Using the option reuse_backup=link from the latest version of barman and doing a barman show-backup of a just-completed backup, this is what Navionics sees:
Base backup information: Disk usage : 13.2 TiB (13.2 TiB with WALs) Incremental size : 5.0 TiB (-62.01%) |
Moreover, the backup execution time drops significantly from 52 hours to approximately 17 hours. The advantages are obvious:
- the execution time decreases by approximately 68%;
- only TiB 5.0 of data is copied instead of 13 TiB (-62%);
- the disk space taken up is 5.0 TiB instead of 13 TiB (-62%).
pg_stat_archiver: integration into Barman 1.4.0
Among the new features introduced by PostgreSQL 9.4 is the view pg_stat_archiver that provides useful information regarding the operating status of the WAL storage process. Thanks to these statistics, it is also possible to make predictions on the space that a new backup will occupy. Users of Barman 1.4.0 and PostgreSQL 9.4 may notice f a number of new fields within the output of the following commands:
barman check
:- the Boolean field
is_archiving
that indicates the status of the archiving process.
- the Boolean field
barman status
:last_archived_time
reports the storage time of the last WAL file;failed_count
the number of failed WAL storage attempts;server_archived_wals_per_hour
the storage rate of WAL/hour;
barman show-server
adds to the set of server statistics all fields that make up the viewpg_stat_archiver
.
Conclusions
The incremental backup, the main feature of this release, is undoubtedly a very useful tool for everyone, saving time and space, even on modest-size databases. It is almost indispensable for users who need to manage very large databases (VLDB) or databases that contain a large number of read-only tables, providing a significant increase in performance in terms of space, time and bandwidth utilised. Adding integration with pg_stat_archiver
on Postgre SQL 9.4 improves the ability to monitor the status of servers and thus optimises the health and strength of infrastructures that choose Barman as a disaster recovery solution for the PostgreSQL database.
What happens if, in the mean time, a file that was identified as not being modified, is actually modified before the incremental backup has been completed ? I guess that in any case you’ll have to apply wals isn’t it ?
Barman uses rsync checksum copy to evaluate every file that have been modified after the start time of the backup used as reference.
Doing so we are sure that we will copy every file that have been modified.