Vacuuming is a key feature of PostgreSQL databases to keep databases healthy and optimized. For this, Autovacuum is configured to conserve space by removing obsolete disk usage, and to optimize database performance by speeding up sequential scans, just as one example.
To discuss this topic in-depth, 2ndQuadrant hosted the “Postgres Vacuuming Through Pictures” webinar. The webinar covered key concepts of Postgres Vacuum and Autovacuum and provided a demonstration on how tuning vacuum properly makes a difference in database performance.
The webinar was presented by Tom Kincaid, GM North America at 2ndQuadrant. Those who weren’t able to attend the live event can now view the recording here.
Questions that we couldn’t respond to during the live webinar due to time restrictions have been answered below.
Q: Is there a way to see how much space could be cleared but not run the actual vacuum process, i.e. like a dry-run mode?
A: There is no vacuum dry-run facility. A running database has too much that could change between a dry-run command and an actual command.
Q: Are Vacuuming insert-only tables necessary only because analyze is also needed from time-to-time? Is this vacuum can only be triggered manually as there are no dead tuple in insert-only tables?
A: Vacuum on insert only tables is also necessary to freeze tuples. I didn’t cover freezing in this webinar but I will in a future webinar. You can analyze a table independent of vacuum.
Q: Can the vacuum also perform on Default tables under schema – pg_catalog? Recently I faced bloat issue on one of the table, when i checked, it was under default schema and I was unable to perform vacuum on it.
A: Yes, it will execute on the catalog tables.
Q: Is there a set of queries that can tell you what is not allowing the dead tuple to be removed?
A: There is not a specific query to tall you what is causing a particular set of tuples not to be reclaimed by vacuum. However, the pg_stat_activity view will enable you to see what transactions are still running which ultimately impacts what tuples get reclaimed during vacuum. There are some more advanced things you can do by looking at the xmin value of the tuple. Getting deeper into that is probably a good blog article.
Q: If autovacuum is configured correctly, is there a reason to run vacuum full ? What is the reason to run vacuum full?
A: The answer is if you have configured auto vacuum correctly you probably never need to run a vacuum full. However, there are situations we a deployment can’t afford the continual cost of regular vacuums through out the day. However, they can afford the outage of a vacuum full on a weekend. It is highly dependent on the circumstances of the deployment. So it is conceivable that a vacuum full would be the way to go but generally no.
Q: If we stop a vacuum halfway in middle will half of the tuples get vacuumed?
A: It is not that simple. Some portion of the vacuum operation such as the table scan will need to be performed again. Generally speaking you should avoid stopping a running vacuum process.
Q: Is it better to use vacuumdb in-stages or normal vacuumdb as a part of daily cron?
A: The best approach is have auto vacuum tuned properly and to augment it with cron jobs on particular tables as required.
Q: I would like to know I’ve got a table that gets dead tuples very frequently and I don’t want to wait for auto vacuum threshold and set a manual threshold for auto vacuum to kick off more aggressively, recommended or anything you could share on this?
A: It is hard to be sure what to recommend without looking at the entire configuration and the results. You might want to consider table level settings for auto_vacuum_threshold and auto_vacuum_cost_delay. Lowering both of these will enable auto_vacuum to more effectively keep up with the bloat occurring on the table.
For any questions, comments, or feedback, please visit our website or send an email to [email protected].