What does pg_start_backup() do?

January 23, 2017/5 Comments/in 2ndQuadrant, Simon's PlanetPostgreSQL /by Simon Riggs

Reading mailing lists can damage your health, as I recently discovered on the PostgreSQL Performance list where backup was being discussed.

First off, don’t read blogs for finding out critical pieces of info. Read the docs because they are accurate, fully reviewed and well maintained.

I should add that I was the initial author of them as well, so maybe it’s OK to carry on reading…

pg_start_backup() is a function we execute to start a base backup. It was part of the original API for physical backup introduced in PostgreSQL 8.0. It’s now been mostly superceded by the replication command BASE_BACKUP, which is most frequently executed by the pg_basebackup utility.

So what does a base backup actually do? Well, first we execute a checkpoint so that as many changed data blocks are on disk as possible. Next we force full page writes to occur, even if full_page_writes = off, because we need to see the whole page for any changes. Lastly, we record the starting point of the backup. That’s all.

Base backup does NOT prevent writes to the data directory. It’s designed to be “fully online” so it doesn’t take locks on objects, doesn’t interefere with the operation of the database apart from some details if you try to shut it down while taking a backup.

pg_stop_backup() is the end marker for that backup.

The key point is that the base backup is NOT a consistent copy of the database. You might have copied every file, but all the data is taken at different times. So its wrong. Until you recover the database with the WAL changes that occurred between the start backup and the stop backup.

Which is why you’ll be wanting to use a command like this

pg_basebackup –xlog-method=stream

or use a utility that does everything for you, like Barman.

5 replies

Adam Scott says:
January 24, 2017 at 12:51 am

Great reminder on pg_start_backup()!

I’ve never tried a restore without the WAL files. I’m guessing there would be a complaint of missing WAL files. Looking through xlog.c (line 7196), I’m guessing you will see a message along the lines of: “WAL ends before end of online backup”.

So when one performs their scheduled test recovery and you see that message, you know you aren’t getting consistent backups.
Reply
Tushar says:
February 1, 2017 at 8:11 pm

To the point explanation. thanks
Reply
EBB PostgreSQL says:
January 23, 2019 at 3:28 pm

Can you give an example of the issue if –xlog-method does not set to stream?
Reply
Francis Demierre says:
September 23, 2019 at 4:52 pm

Great stuff…. thanks.

Although you said:
Base backup does NOT prevent writes to the data directory. It’s designed to be “fully online” so it doesn’t take locks on objects, doesn’t interefere with the operation of the database apart from some details if you try to shut it down while taking a backup.

I have just have two questions (my observations make me wonder ….).

1) does PostgreSQL continue to do regular checkpoints between pg_start_backup() and pg_stop_backup() ?
2) does it continue to move WAL files from pg_log/pg_wal to the archive directory using the ‘archive’ defined command ?

Thanks for a reply.
Best Regards
Francis
Reply
- craig.ringer says:
  November 4, 2019 at 1:04 pm
  
  (1) Yes PostgreSQL continues to perform checkpoints during base backups. A base backup doesn’t guarantee that you’ll see a consistent copy of the data as of the time the base backup started. It promises that you’ll get a consistent view of the data as it was after the backup finishes and the required WAL segments are applied during recovery. So PostgreSQL is free to delete files, etc; if it’s deleting them then they won’t be needed anymore to create a consistent copy of the end-of-backup state.
  
  (2) Yes, PostgreSQL continues to archive WAL when archive mode is enabled. It also continues to service streaming replication clients etc.
  
  I strongly suggest that you use pg_basebackup -X stream to have pg_basebackup copy WAL from the server at the same time as the base backup. If you’re concerned that the server may remove WAL too fast, have pg_basebackup use a streaming replication slot to ensure the needed WAL is retained. See the pg_basebackup documentation for details.
  Reply

Want to join the discussion?
Feel free to contribute!

2ndQuadrant is now part of EDB

What does pg_start_backup() do?

Leave a Reply

Leave a Reply Cancel reply

Support & Services

Products

Postgres Learning Center

About Us