2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / David's PlanetPostgreSQL3 / Benchmark on a Parallel Processing Monster!
David Rowley

Benchmark on a Parallel Processing Monster!

February 28, 2017/8 Comments/in David's PlanetPostgreSQL /by David Rowley

Last year I wrote about a benchmark which I performed on the Parallel Aggregate feature that I worked on for PostgreSQL 9.6.  I was pretty excited to see this code finally ship in September last year, however something stood out on the release announcement that I didn’t quite understand:

Scale Up with Parallel Query

Version 9.6 adds support for parallelizing some query operations, enabling utilization of several or all of the cores on a server to return query results faster. This release includes parallel sequential (table) scan, aggregation, and joins. Depending on details and available cores, parallelism can speed up big data queries by as much as 32 times faster.

It was the “as much as 32 times faster” that I was confused at. I saw no reason for this limit. Sure, if you have 32 CPU cores, then that’ll be the limit, but what if you had…. say…. 72 of them?

Luckily I just happened to get access to such a machine, thanks to our friends over at Postgres-Pro.

Hardware

  • 4 x E7-8890 v3 CPUs (72 cores, 144 with HT)
  • 3TB RAM
  • P830i RAID controller (4GB write cache)
  • 10x 1.92TB RAID10 12Gbps SAS SSD (VO1920JEUQQ)
  • CentOS 7.2.1511 (with kernel 3.10.0-327.13.1.el7)

Benchmark

I decided to perform the same TPC-H Query 1 that I had used last year on the 32 core server. This performs some fairly complex aggregation on a single table. My weapon of choice was PostgreSQL 9.6.

tpchq1sql

I decided to use a dataset 1 TB in size, meaning that there was about 6 billion rows in the “lineitem” table. The output of the query produces only 4 rows, which means the parallel part of the query, in this case a Parallel Seq Scan, followed by a Parallel Partial Aggregate is the vast majority of the work, whereas the sequential part, the Finalize Aggregate and the Sort, are very short, and they only produce 4 rows.

I tested various parallel worker counts by performing:

ALTER TABLE lineitem SET (parallel_workers= <number of parallel worker processes>);

Due to the high number of cores in this server I performed this in increments of 4, after having reached 4 parallel workers.

Results

Chart of benchmark results with logarithmic time scale

Chart of benchmark results with logarithmic time scale

We can see that the server is able to scale quite well up until around 72 workers, where the efficiency starts dropping fast. That’s not too big a surprises giving that this server has 72 cores. In this case 100% efficiency would mean that if we went from 5 workers to 10 that the query response times would half. The red line in the chart marks this theoretical maximum, and the green line (efficiency) shows how close we actually got to this.

I created a “base time” to calculate this efficiency by simply disabling parallelism altogether, and running the query on a single CPU core, which resulted in a query time of 9052.8 seconds. The fastest times I saw were around the 72 to 76 parallel worker mark, with 149.7 and 148.6 seconds respectively. This means we got a speed up of almost 61 times! Slightly more than the 32 claimed by the release notes.  Is that undersell over deliver, maybe?

And just for fun

I thought I’d remove the log scale on the seconds axis, just so we can see the performance in linear time.

Chart of benchmark results with linear time scale

Chart of benchmark results with linear time scale

Not such a useful chart, but impressive how much performance has increased over the non-parallel query.

OK, so just where are the limits?

Well, I just so happened to test that too! Of course not all limits, but I wondered where the scaling would stop on this machine.

I simplified the query to become a simple SELECT COUNT(*) from lineitems; and on testing, I came hard up against the lock which the parallel workers use obtain the page number which they’re to process next. This stops two workers both processing the same page, which would give incorrect results. Within this lock, the parallel worker is only trying to get a page number to look at next, not the actual page itself. This means that the parallel worker processes were waiting for other fellow workers to simply read a 32 bit numerical value and add the value of 1 to it (so it’s ready for the next worker to read), before finally releasing the lock. Which means if we added even more CPUs, I’d have eventually hit this lock in TPCH-Q1 too.

The great news is that this will be very easy to put right, as currently the workers only take one page at a time. If we just gave them a few at once when the table is large, then that would solve that problem as each worker would have more work to do before having to check what it should to do next.

Thanks to everyone involved in making PostgreSQL a parallel processing monster!

I’m truly excited to what we can achieve here in the future.

Tags: Benchmark, benchmarking, parellel queries, performance, PostgreSQL
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
8 replies
  1. Matt
    Matt says:
    February 28, 2017 at 4:28 pm

    Was HT enabled…? I’d have thought there’d be some improvement between 72 and 144 workers.

    Reply
    • David Rowley
      David Rowley says:
      March 1, 2017 at 5:02 am

      Yes, it was enabled:

      CPU(s): 144
      Thread(s) per core: 2
      Core(s) per socket: 18
      Socket(s): 4

      Reply
  2. Avinash
    Avinash says:
    March 1, 2017 at 3:37 am

    Excellent article.

    Curious to know if there any way to increase the number of worker settings for all the tables inside the schema?

    Reply
    • David Rowley
      David Rowley says:
      March 1, 2017 at 5:10 am

      Thanks.

      There’s no direct way to do this, but you could, from psql do:

      select ‘ALTER TABLE ‘ || oid::regclass || ‘ SET (parallel_workers = 8);’ from pg_class where relnamespace = (select oid from pg_namespace where nspname = ‘public’) and relkind = ‘r’;
      \gexec

      Assuming you wanted to set parallel_workers to 8 for all tables in the public schema. Although you’d need to do this again when a new table is created.

      You might find that you’d be better paying attention to the size of the table rather than which schema it is in. I’ll just assume you have a schema full of big tables though, and know what you’re doing.

      Reply
  3. Andrew
    Andrew says:
    March 1, 2017 at 8:43 am

    Hi David,

    A very useful and informative post! I’m just wondering what tool(s) you used to detect the page lock being hit when you changed to using the SELECT count(*) query.

    Reply
    • David Rowley
      David Rowley says:
      March 1, 2017 at 8:57 am

      Thanks. perf simply showed heap_parallelscan_nextpage() very high in the profile of a worker.

      Reply
  4. George
    George says:
    March 27, 2017 at 11:51 am

    Hey, David, could you please publish the PostgreSQL config you were using to do the tests?

    Thanks a lot! And great post, BTW!

    Reply
    • David Rowley
      David Rowley says:
      April 11, 2017 at 12:41 pm

      Hi George,

      All settings were standard apart from the following:

      max_connections = 100
      shared_buffers = 64GB
      work_mem = 10GB
      maintenance_work_mem = 100GB
      dynamic_shared_memory_type = posix
      effective_io_concurrency = 10
      max_worker_processes = 300
      max_parallel_workers_per_gather = 144
      fsync = off # mainly to increase the speed of loading data
      full_page_writes = off # mainly to increase the speed loading of data
      max_wal_size = 100GB

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Looking forward to PGConf India 2017 Autovacuum Tuning Basics
Scroll to top
×