2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Craig's PlanetPostgreSQL3 / Tracing PostgreSQL with perf
craig.ringer

Tracing PostgreSQL with perf

November 4, 2013/2 Comments/in Craig's PlanetPostgreSQL, PostgreSQL /by craig.ringer

The profiling utility perf that ships with the Linux kernel is extremely useful for examining system-wide and multi-process behaviour – but it does a lot more than the CPU profiling it’s often used for. You’ve probably looked at perf top -az or perf top -u postgres output, but that’s only the littlest bit of what it can do. (If you want the TL/DR version, jump down to “User space dynamic probes”).

One of the big advantages of perf is that it’s non-intrusive. You don’t have to attach a debugger and interrupt execution. You don’t have to run commands directly under a profiler in a special environment. There’s no need to re-start the server to debug a problematic workload, and often no need to re-compile with debug options. This is extremely useful when you’re trying to track down performance problems in a live system, as it lets you test theories about what might be going on quickly and with minimal impact.

perf isn’t just a profiler, it also has tracing support. Profiling is based on sampling of the state of the system when triggered by hardware or software performance counters; it gives a statistical sampling of points where the system spends most time. Tracing instead takes samples whenever a particular trace event occurs, so it’s much more useful for infrequent but important events.

When working with PostgreSQL one of the most exciting features of perf is the ability to trace user-space processes. Want to know how often your PostgreSQL is swapping WAL segments, how often it does foreign key lookups, etc? For one PostgreSQL backend or across the whole cluster? perf can help with that.

User-space and kernel-space tracepoints can be mixed and can be used at the same time as performance counter profiling to help you get a good picture of the system. perf can capture stack traces from both kernel and user-space, and can also do statistical visualisations. User-space tracepoints are created with dynamic probes; kernel-space ones can be pre-defined or can be dynamic probes.

So, how do you use some of these features?

Install the tools

First, make sure you’re using a current perf. This article was written on Fedora 19 with perf 3.11.6 on x86_64, and some of the features are relatively new.

If you want user-space stack results you’ll want the code you’re looking at to be built with -Og -ggdb -fno-omit-frame-pointer. If you’re using a perf built with libunwind you don’t need frame-pointers; see this Stack Overflow post and RH Bugzilla #1025603. None of this is needed if you’re only interested in kernel side data. If you’re using distro packages you might need to install -debuginfo packages too.

All the following tests were run with stock PGDG PostgreSQL 9.2 packages from http://yum.postgresql.org/ using a perf rebuilt with libunwind support according to the above instructions.

Kernel tracepoints and probes

perf can capture data from pre-defined kernel tracepoints, some of which are informative when looking at issues with memory fragmentation, disk I/O, etc. You can get a list of tracepoints with sudo perf list. Tracepoint lists can be specified and wildcards are supported. For example, if we want to get write and disk flush statistics on a running PostgreSQL instance we might run:

sudo perf record -g dwarf -e block:block_rq_issue,syscalls:sys_enter_fsync -u postgres sleep 10

to capture the data. Instead of a sleep you can use no command and hit control-C when you’re done capturing, or you can use some other command like a psql -c to trigger the workload you want to measure.

-u postgres profiles all processes running as user postgres. You can instead use -a for whole-system profiling across all CPUs. It’s also possible to trace just one backend. Start psql, run select pg_backend_pid(), run perf with -p $the_pid, then start the workload in the same psql session.

When you’re working with PostgreSQL the default target process, which is the command run under the control of perf, isn’t usually very useful because the backend does most of the work, not psql. It’s still useful to use the subcommand to control the test workload and timing.

Once you’ve captured the data you can use perf report to examine it. There are too many options to discuss here – to control result aggregation and simplification, stack trace display, interactive curses vs text report output, and more.

Take this session as an example, where there is a shell session (terminal “T2”) and a postgres session connected to database “regress” (terminal “T1”):

T1| regress=> select pg_backend_pid();
T1| pg_backend_pid 
T1| ----------------
T1|           4495
T1|(1 row)
T2| $ sudo perf record -g dwarf -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync -p 4495
T1| regress=> create table x as select a FROM generate_series(1,1000000) a;
T1| regress=>
T2| $ ^C
T2| [ perf record: Woken up 332 times to write data ]
T2| [ perf record: Captured and wrote 86.404 MB perf.data (~3775041 samples) ]
T2|
T2| $ sudo perf report -g

You can use perf report‘s curses gui to dig down into the trace, or you can use the perf report --stdio option to get it to stream data to stdout. For example, if you want stack traces you might run:

$ sudo perf report -g --stdio
... blah blah ...
# Samples: 1  of event 'syscalls:sys_enter_fsync'
# Event count (approx.): 1
#
# Overhead   Command  Shared Object                 Symbol
# ........  ........  .............  .....................
#
   100.00%  postgres  libc-2.17.so   [.] __GI___libc_fsync
            |
            --- __GI___libc_fsync
                mdimmedsync
                heap_sync
                intorel_shutdown
                standard_ExecutorRun
                ExecCreateTableAs
                PortalRunUtility
                PortalRunMulti
                PortalRun
                PostgresMain
                ServerLoop
                PostmasterMain
                main
                __libc_start_main
                _start
                (nil)
... blah blah...

showing that for the event syscalls:sys_enter_fsync there was one event with the above stack, an fsync invoked via ExecCreateTableAs.

(For a reason I haven’t yet been able to pin down the final fsync() doesn’t seem to be captured by perf when psql is run directly under the control of perf. This isn’t an issue with perf stat, only perf record. That’s why I’m jumping through hoops to pre-select the backend by pid above.)

User-space dynamic probes

Sometimes you’re more interested in something happening within PostgreSQL its self than events within the kernel that are triggered by PostgreSQL. Newer versions of perf can help with this by inserting dynamic tracepoints that trigger on calls in user-space programs.

Let’s say you’re interested in watching WAL activity, and want to see when XLogFlush, XLogFileInit or XLogFileOpen are called. You can insert dynamic tracepoints for these calls with perf:

sudo perf probe  -x `which postgres` XLogFileInit
sudo perf probe  -x `which postgres` XLogFileOpen
sudo perf probe  -x `which postgres` XLogFlush

You can only probe external symbols (non-static, not hidden by -fvisibility flags) unless you built with -ggdb. perf will complain no symbols found if you try to use a symbol that doesn’t exist. At time of writing perf doesn’t support using external debuginfo to look up symbols for probes, though it can use it for stack analysis. In general, if it’s an extern in src/include you can use it with perf.

Each tracepoint will print the name of the tracepoint created and you can use perf probe -l to list them all anyway:

$ sudo perf probe -l
  probe_postgres:XLogFileInit (on 0x000000000009a360)
  probe_postgres:XLogFileOpen (on 0x000000000009a860)
  probe_postgres:XLogFlush (on 0x00000000000a0670)

These probes are now usable as perf events. Let’s take a look at xlog activity during a sample workload, monitoring across the whole cluster while I do a pgbench run:

sudo perf record -g dwarf -u postgres -e probe_postgres:XLogFileInit,probe_postgres:XLogFileOpen,probe_postgres:XLogFlush

Try it yourself with perf report -g . Here’s how the results look. You can use options like -g fractal,0 to control detail. You’ll be able to see the percentage of a given counter’s hits that came from one stack branch or another, the pid and process, etc. The --sort options give you more control over aggregation and grouping.

But wait, there’s more

You should also check out the perf stat and perf top commands. They can take the same event-lists as perf record, though for some odd reason their process-filter support is different.

Here’s an example that runs a dummy workload and looks at I/O kernel tracepoints during the run:

$ sudo perf stat -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync -a -r 5 -- psql -q -U postgres craig -c "drop table if exists x; create table x as select a FROM generate_series(1,1000000) a;";

 Performance counter stats for 'psql -U postgres craig -c drop table if exists x; create table x as select a FROM generate_series(1,1000000) a;' (5 runs):

     0 block:block_rq_abort                       [100.00%]
     0 block:block_rq_requeue                     [100.00%]
    97 block:block_rq_complete      ( +- 14.82% ) [100.00%]
    96 block:block_rq_insert        ( +- 14.97% ) [100.00%] 
    98 block:block_rq_issue         ( +- 14.67% ) [100.00%]
     0 block:block_rq_remap                       [100.00%]
10,607 syscalls:sys_enter_write     ( +-  0.17% ) [100.00%]
     1 syscalls:sys_enter_fsync                                    

0.908835058 seconds time elapsed    ( +- 18.31% )

You can see it’s doing an average of about 100 block layer I/O requests over 10k write()s and doing a single fsync(). Some of that is system background noise since we’re doing all system profiling (-a), but since the system is fairly idle it isn’t much, and it’s averaged out over five runs.

Similarly, using the dynamic probes we added earlier, keep an eye on xlog activity during a pgbench run:

$ sudo perf stat -e probe_postgres:XLogFileInit,probe_postgres:XLogFileOpen,probe_postgres:XLogFlush -a -- /usr/pgsql-9.2/bin/pgbench -U postgres craig -c 2 -t 10000
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 2
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 20000/20000
tps = 715.854663 (including connections establishing)
tps = 716.092133 (excluding connections establishing)

 Performance counter stats for '/usr/pgsql-9.2/bin/pgbench -U postgres craig -c 2 -t 10000':

      64 probe_postgres:XLogFileInit       [100.00%]
       0 probe_postgres:XLogFileOpen       [100.00%]
  55,440 probe_postgres:XLogFlush                                    

 27.987364469 seconds time elapsed

There’s plenty more you can do, including capturing the state of local variables with perf probe. I’ll write up some useful examples of that one later. In the mean time, play, explore and have fun with a new diagnostic tool.


Update: Michael Paquier recently wrote a related article about tracing PostgreSQL with systemtap that may be of interest to readers of this one. You have to recompile Pg to use that approach, but the syntax is nicer and it offers some other advantages.

Tags: Linux, perf, performance
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
2 replies
  1. isanketh
    isanketh says:
    November 27, 2013 at 11:43 pm

    Thanks, this was useful. I assume you had to enable these tracepoints first?

    Reply
    • craig.ringer
      craig.ringer says:
      November 28, 2013 at 3:14 am

      The “perf probe” commands create tracepoints in user-space executables/libraries; this is shown above with the creation of the probe_postgres:XLogFileInit tracepoint, etc.

      The other tracepoints used are built in to perf and the kernel. If you don’t have them you might either be running perf without appropriate access rights (it can’t see all tracepoints as non-root) or on an older kernel.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Postgres and devops: testing 9.3 with Vagrant and Puppet – part one Postgres and devops: testing 9.3 with Vagrant and Puppet – part two
Scroll to top
×