2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Shaun's PlanetPostgreSQL3 / PG Phriday: Terrific Throughput Tracking
Shaun Thomas

PG Phriday: Terrific Throughput Tracking

January 25, 2019/0 Comments/in Shaun's PlanetPostgreSQL /by Shaun Thomas

Postgres has a lot of built-in information functions that most people don’t know about. Some of these are critical components to identifying replication lag, but what if we could use them for something else, like throughput?

This man’s one simple trick can track actual database throughput; DBAs hate him!

Everybody Knows

Let’s take a look at a common query we might use to track replication lag between a Postgres 11 Primary and one or more Replica nodes.

SELECT client_addr,
       pg_wal_lsn_diff(
           pg_current_wal_lsn(),
           sent_lsn
       ) AS sent_lag,
       pg_wal_lsn_diff(
           pg_current_wal_lsn(),
           write_lsn
       ) AS write_lag,
       pg_wal_lsn_diff(
           pg_current_wal_lsn(),
           flush_lsn
       ) AS flush_lag,
       pg_wal_lsn_diff(
           pg_current_wal_lsn(),
           replay_lsn
       ) AS replay_lag
  FROM pg_stat_replication;

This gives us the number of bytes the Primary has transmitted to remote servers, given a number of different metrics. How much have we sent? Has that data been written to disk? How much has been synced and is thus safe from crashes? How much has actually been replayed on the Postgres data files? For more information on the pg_stat_replication view, check out the documentation.

If we were using slots instead, we might do this:

SELECT slot_name,
       pg_wal_lsn_diff(
         pg_current_wal_lsn(),
         restart_lsn
       ) as restart_lag,
       pg_wal_lsn_diff(
         pg_current_wal_lsn(),
         confirmed_flush_lsn
       ) as flush_lag
  FROM pg_replication_slots;

There is somewhat less information in pg_replication_slots, as it only really tells us the minimum LSN position the Primary needs to retain for that replica. Of course, if that stops advancing while the Primary keeps working, that’s lag we can see even while the replica is disconnected.

This is especially important with replication slots, since they necessarily mean the Primary is retaining WAL files a replica may need. We don’t want the replay lag value to get too high, or the primary node could run out of disk space. That is definitely something we want to monitor.

That’s How it Goes

It’s also a hint as to our database throughput. If we "pin" the value by disconnecting a replica, the value will continue to increase at the same rate the Primary node produces WAL traffic.

But wait a minute; what good is database throughput? It must be pretty useful, as developers have included several indicators in the pg_stat_database view. That view will tell us how many inserts, updates, and deletes a database may have handled, as well as commit versus rollback activity.

Despite all that (and much more), this statistical view won’t tell us anything about data volume, plus it isn’t persistent, and replicas maintain their own stats independently. What if there was a way to determine how active our Primary database is forever, from any node, any time we can take two readings separated by time?

Ah Give or Take a Night or Two

Consider a brand new fresh instance that has just been bootstrapped with initdb. It has processed essentially no transactions, contains no tables, and isn’t currently doing anything. Had we known we’d want to track overall instance activity, we may have called pg_current_wal_lsn() to retrieve the starting LSN.

For example:

SELECT pg_current_wal_lsn();

 pg_current_wal_lsn 
--------------------
 0/1652EC0

In theory, we could use this value on any future call to pg_wal_lsn_diff to obtain the amount of bytes produced by the database instance since its inception. Unfortunately, it’s hardly ever the case we have such an opportunity unless we created the Postgres instance ourselves.

But what is really the difference between 0/1652EC0, and the absolute zero value of 0/0?

SELECT pg_wal_lsn_diff('0/1652EC0', '0/0');

 pg_wal_lsn_diff 
-----------------
        23408320

That’s not even 24MB, which is less than two 16MB WAL files. In that case, we can safely substitute 0/0 and suddenly any execution of pg_current_wal_lsn is trivially transformed into an absolute amount of data produced by our Postgres instance.

And a long stem rose

Let’s initialize a medium 100-scale pgbench instance and check the amount of bytes reported at the end.

SELECT a.total_bytes, 
       (a.total_bytes/1024/1024)::BIGINT AS mbytes
  FROM pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') a (total_bytes);

 total_bytes | mbytes 
-------------+--------
  1328940592 |   1267

So Postgres has handled about 1267 MB of data since we started. How about after processing a one-minute pgbench test?

 total_bytes | mbytes 
-------------+--------
  1422082032 |   1356

What if we add another index to the pgbench_accounts table?

CREATE INDEX idx_account_balance ON pgbench_accounts (abalance);

 total_bytes | mbytes 
-------------+--------
  1626342784 |   1551

We included that last example to illustrate that this really measures WAL traffic throughput, thus anything that produces WAL traffic will drive the value higher. It’s not just writes to the instance, but anything that modifies its contents. Even read-only transactions will slowly drive this value forward, as these require various database resources including transaction identifiers.

Is there a better cumulative indicator for how much real actual work a Postgres instance is performing?

Before it Blows

Real work is the key to this metric. We could send the query to any number of monitoring tools, and so long as they can compare previous values across time, we now have a total activity chart for our Postgres instance.

Did something perform a large bulk insert? Is client activity producing a steady stream of data and transaction traffic that’s threatening to overwhelm our hardware? Do we need to reevaluate resource allocation to prevent such throughput spikes from negatively impacting overall performance?

These are all questions the knowledge of Postgres throughput can provide. If the graphs show an upward trend, we can even plan for future expansions. It’s a cool piece of data that few, if any, existing monitoring scripts such as check_postgres bother to integrate.

Being proactive is always preferable to the alternative, so adding yet another query to our monitoring toolkit is always a step forward. Don’t forget to watch those charts.

And everybody knows that it’s now or never.

Tags: lag, lsn, throughput, wal
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Maintaining feature branches and submitting patches with Git Webinar: Banking on Postgres – Financial Application Considerations [Follow...
Scroll to top
×