2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / In the defense of sar (and how to configure it)
Tomas Vondra

In the defense of sar (and how to configure it)

April 26, 2017/6 Comments/in 2ndQuadrant, PostgreSQL, Tomas' PlanetPostgreSQL /by Tomas Vondra

Let me discuss a topic that is not inherently PostgreSQL specific, but that I regularly run into while investigating issues on customer systems, evaluating “supportability” of those systems, etc. It’s the importance of having a monitoring solution for system metrics, configuring it reasonably, and why sar is still by far my favorite tool (at least on Linux).

On the importance of monitoring

Firstly, monitoring of basic system metrics (CPU, I/O, memory) is extremely important. It’s a bit strange having to point this in discussions with other engineers, but I’d say 1 in 10 engineers thinks they don’t really need monitoring. The reasoning usually goes along these lines:

It’s just another source of useless overhead. You don’t really need monitoring unless there’s an issue, and issues should be rare. And if there’s an issue, we can enable the monitoring temporarily.

It’s true monitoring adds overhead, no doubt about it. But it’s likely negligible compared to what the application is doing. Actually, sar is not really adding any extra instrumentation, it’s merely reading counters from the nernel, computing deltas and writing that to disk. It may need some disk space and I/O (depending on number of CPUs and disks) but that’s about it.

For example collecting per-second statistics on a machine with 32 cores and multiple disks will produce ~5GB of raw data per day, but it compresses extremely well, often to ~5-10%). And it’s barely visible in top. Per-second resolution is a bit extreme, and using 5 or 10 seconds will further reduce the overhead.

So no, turns out the overhead actually is not a valid reasons not to enable monitoring.

Costs vs. benefits

More importantly though, “How much overhead do I eliminate by not enabling monitoring?” is the wrong question to ask. Instead you should be asking “What benefits do I get from the monitoring? Do the benefits outweigh the costs?”

We already know the costs (overhead) are fairly small or entirely negligible. What are the benefits? In my experience, having monitoring data is effectively invaluable.

Firstly, it allows you to investigate issues – looking at a bunch of charts and looking for sudden changes is surprisingly effective, and often leads you directly to the right issue. Similarly, comparing the current data (collected during the issue) to a baseline (collected when everything is fine) is very useful, and impossible if you only enable monitoring when things break.

Secondly, it allows you to evaluate trends and identify potential issues before they actually hit you. How much CPU are you using? Is the CPU usage growing over time? Are there some suspicious patterns in memory usage? You can only answer those questions if you have the monitoring in place.

Why sar is my favorite tool

Let’s assume I’ve convinced you monitoring is important and you should definitely do it. But why is sar our favorite tool, when there are various fancy alternatives, both on-premise and cloud based?

  • It’s included in all distributions, ans is trivial to install/setup. This makes it fairly simple to convince people to enable it.
  • It’s right on the machine. So if you SSH to the machine, you can also get the monitoring data.
  • It’s using simple text output. Trivial process the data – import it into a database, analyze, attach it to a support ticket. That’s pretty difficult with other tools that generally don’t allow you to export the data easily, only show charts and/or significantly restrict what analysis you can perform, etc.

I do admit some of this comes from the fact that I work for a company providing PostgreSQL services to other companies (be it 24×7 support or Remote DBA. So we usually get only a very limited access to customer systems (mostly just database servers and nothing more). That means having all the important data on the database server itself, accessible over plain SSH, is extremely convenient and eliminates unnecessary round-trips only to request another piece of data from some other system. Which saves both time and sanity on both sides.

If you have many systems to manage, you’ll probably prefer a monitoring solution that collects data from many machines to a single place. But for me, sar still wins.

So, how to configure it?

I mentioned installing and enabling sar (or rather sysstat, which is the package including sar) is very simple. Unfortunately, the default configuration is somewhat bad. After installing sysstat, you’ll find something like this in /etc/cron.d/sysstat (or wherever your distribution stores cron configuration):

*/10 * * * * root /usr/lib64/sa/sa1 1 1

This effectively says the sa1 command will be executed every 10 minutes, and it will collect a single sample over 1 second. There are two issues, here. Firstly, 10 minutes is fairly low resolution. Secondly, the sample only covers 1 second out of 600, so the remaining 9:59 are not really included in it. This is somewhat OK for long-term trending, where low-resolution random sampling is sufficient. For other purposes you probably need to do something like this instead:

* * * * * root /usr/lib64/sa/sa1 -S XALL 60 1

Which collects one sample per minute, and every sample covers one minute. The -S XALL means all statistics should be collected, including interrupts, individual block devices and partitions, etc. See man sadc for more details.

Summary

So, to sum this post into a few simple points:

  • You should have monitoring, even if you think you don’t need it. Once you run into issues, it’s too late.
  • The costs of monitoring are probably negligible, but certainly much lower than the benefits of having the monitoring data.
  • sar is convenient and very efficient. Maybe you’ll use something else in the future, but it’s good first step.
  • The default configuration is not particularly great (low resolution, 1-second samples). Consider increasing the resolution.

One thing I haven’t mentioned is that sar only deals with system metrics – CPU, disks, memory, processes, not with PostgreSQL statistics. You should definitely monitor that part of the stack too, of course.

Tags: monitoring, sar
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
6 replies
  1. Paul T from okmeter.io
    Paul T from okmeter.io says:
    May 4, 2017 at 2:01 pm

    What do you suggest for PostgreSQL statistics monitoring?

    Reply
    • Tomas Vondra
      Tomas Vondra says:
      May 4, 2017 at 3:07 pm

      I don’t think there’s anything akin to sar, i.e. “simple and on the same machine”. You can snapshot the statistics catalogs, but that’s about it.

      With regular monitoring systems (collecting data from multiple machines/services to a central place), I’d say collectd/grafana is a good choice. A simple alternative is Munin, but it has various limitations as it’s RRD-based.

      Reply
  2. Sebastien
    Sebastien says:
    February 20, 2018 at 1:51 pm

    Thank you for this article.
    FYI all per-second statistics displayed by sar are average values over the given time interval. This is true also for CPU activity, which means that even if sa1 is executed only every 10 minutes, the sample collected will give you a value covering the past 10 minutes, not 1 second.

    Reply
    • Tomas Vondra
      Tomas Vondra says:
      February 23, 2018 at 1:23 am

      No, that is not true. Or more precisely, it’s not true for the default parameters used in cron jobs. The usual cron job is this:

      */10 * * * * root /usr/lib64/sa/sa1 1 1

      which says, run it every 10 minutes, and every time collect one 1-second sample.

      Reply
  3. Sebastien
    Sebastien says:
    February 26, 2018 at 9:00 am

    No, I insist… 🙂

    */10 * * * * root /usr/lib64/sa/sa1 1 1

    actually means “run sa1 (sadc) every 10 minutes, and every time collect one sample”. The interval parameter (the first “1” value given to sa1) is meaningless here since you need at least 2 samples to define an interval.
    The meaning is a bit different from that of sar, where e.g., “sar 1 1” means “display one line of statistics covering a one-second interval”, and in this case, *two* samples need to be collected by sar’s backend (sadc). You can have a look at question 2.22 from sysstat’s FAQ (see link below).
    Remember that counters collected by sadc are, in most cases, cumulative values since boot time (also see question 2.15 in sysstat’s FAQ). So you can take one snapshot at time t, then another one 10 minutes later, the values displayed will actually cover the whole 10 minutes interval, but of course, those statistics (CPU utilization, network traffic, context switches, etc.) will be average values over the period. So maybe the dips and spikes will be less visible.
    On the other hand values like e.g., memory utilization (values displayed by “sar -r”), are actually instantaneous values: They give you a view of your system at the very moment when they are collected.

    Sysstat’s FAQ:
    http://sebastien.godard.pagesperso-orange.fr/faq.html
    https://github.com/sysstat/sysstat/wiki

    Regards,
    Sebastien (author and maintainer of the sysstat package)

    Reply
    • Tomas Vondra
      Tomas Vondra says:
      February 26, 2018 at 3:13 pm

      Aha, I see! So `sa1 1 1` essentially just writes a single record into the data file, and `sar` then computes “per second” averages between those samples.

      So with the default cronjob definition

      */10 * * * * root /usr/lib64/sa/sa1 1 1

      will produce averages over 10-minute intervals. That is certainly better than collecting 1-second samples every 10 minutes, that’s for sure. I still think that’s not a sufficient granularity for our needs, though (even ignoring that some metrics are not cumulative but instantaneous).

      Thanks for correcting my long-term misunderstanding, and I promise to not argue with the guy who wrote the tool again 😉

      And thanks for writing it and maintaining it, BTW! It’s extremely useful and versatile.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
PostgreSQL 10 identity columns explained Out of tree builds
Scroll to top
×