2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / David's PlanetPostgreSQL3 / Parallel Aggregate – Getting the most out of your CPUs
David Rowley

Parallel Aggregate – Getting the most out of your CPUs

March 21, 2016/10 Comments/in David's PlanetPostgreSQL /by David Rowley

A small peek into the future of what should be arriving for PostgreSQL 9.6.

Today PostgreSQL took a big step ahead in the data warehouse world and we now are able to perform aggregation in parallel using multiple worker processes! This is great news for those of you who are running large aggregate queries over 10’s of millions or even billions of records, as the workload can now be divided up and shared between worker processes seamlessly.

We performed some tests on a 4 CPU 64 core server with 256GB of RAM using TPC-H @ 100 GB scale on query 1. This query performs some complex aggregation on just over 600 million records and produces 4 output rows.

The base time for this query without parallel aggregates (max_parallel_degree = 0) is 1375 seconds.  If we add a single worker (max_parallel_degree = 1) then the time comes down to 693 seconds, which is just 6 seconds off being twice as fast! So quite close to linear scaling. If we take the worker count up to 10 (max_parallel_degree=10), then the time comes down to 131 seconds, which is once again just 6 seconds off perfect linear scaling!

The chart below helps to paint a picture of this. The blue line is the time in seconds. Remember that the time axis is on a logarithmic scale, (the performance increase is a little too much to see the detail at higher worker counts otherwise)

You can see that even with 30 worker processes we’re still just 20% off of the linear scale. Here the query runs in 56 seconds, which is almost 25 times faster than the non-parallel run.

parallel_aggregate

This really makes the existing parallel seqscan and parallel join changes really shine. Without parallel seqscan, parallel aggregation wouldn’t have been possible, and without parallel join parallel aggregate would only be possible on queries that didn’t contain any joins.

More work is still to be done to parallel-enable some aggregate functions, but today’s commit is a great step forward.

 

UPDATE:

Please note that since writing this blog post the max_parallel_degree setting has been renamed to max_parallel_workers_per_gather. It’s also possible to now experiment with various worker counts by changing the table’s “parallel_workers” option with: ALTER TABLE name SET (parallel_workers = N); where N is the required number of workers. You should also ensure that the server’s max_worker_processes GUC setting is set high enough to accommodate the number of workers that you require. The min_parallel_relation_size, parallel_setup_cost and parallel_tuple_cost settings may also be of some interest. Please see http://www.postgresql.org/docs/9.6/static/runtime-config-query.html for more details.

Tags: 9.6, aggregate, parallel, performance
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
10 replies
  1. paul
    paul says:
    March 23, 2016 at 4:55 pm

    Awesome test. Can you say what actual hardware (4 cpu 64 core) you used? Thanks.

    Reply
    • David Rowley
      David Rowley says:
      March 23, 2016 at 10:03 pm

      Hi Paul, the processors on that machine are Xeon E5-4620

      Reply
  2. Dmitry
    Dmitry says:
    May 19, 2016 at 2:28 pm

    Hi, David

    First big thank’s for implementing parallel aggr for PG.
    I try to test Q1 query from TPC-H (50GB scale factor, 300M rows of LINEITEM table) on PG 9.6beta1 and get next results :

    302171.230 ms for max_parallel_degree = 0
    175506.106 ms for max_parallel_degree = 4
    144931.588 ms for max_parallel_degree = 8

    Changes made to postgresql.conf :

    shared_buffers = 8000MB
    work_mem = 2000MB
    max_parallel_degree = 16
    enable_seqscan = on

    Test performed on SLES11 SP3 virtual machine with 16VCPU

    Can you please advise some tuning for parallel work in this case ?

    Thank’s!

    Reply
    • David Rowley
      David Rowley says:
      May 20, 2016 at 3:34 am

      Hi Dmitry,

      Thanks for the comment.

      Remember the max_parallel_degree is “maximum”, the size of the relation will control how many workers the planner requests the executor to use. Also remember that the number of workers that the planner request the executor might not have available at the time the query executes. If you run EXPLAIN ANALYZE on the query it should show you how many workers are requested and how many are used.

      However, I’d say most likely is that I/O requests cannot be served fast enough to feed the CPUs with the lineitem heap pages they need. When I ran my test I ran it on a machine with 256GB of RAM, and all the buffers were either in kernel page cache or in PostgreSQL’s buffer cache, so it wouldn’t have needed to read any pages from disk.

      I’d say if you have less than 64GB of memory on this machine then that might be the reason. How does it perform on Q1 @ 10GB scale?

      If the planner is not requesting as many workers as you’d like, then you can tweak this manually with;

      alter table lineitem set (parallel_degree=16);

      I’m interested in hearing your updated results on this.

      David

      Reply
  3. Dmitry
    Dmitry says:
    May 20, 2016 at 9:27 am

    Hi, David

    Thank you for explanation.

    You rigth, it is some disk I/O bottleneck’s on my VM

    Unfortunatelly for some reason’s can’t cache full LINEITEM table (44GB) on 60GB RAM VM
    (each Q1 run is reading from disk, but filesystem cache is 45GB populated)

    Ok. I test Q1 (no filter predicates, scan of full table) for rows subset of 45M rows from LINEITEM table (data for 1994 year, 6.5GB in size) and after it is cached get next results :

    86235.123 ms for max_parallel_degree = 0
    22301.170 ms for max_parallel_degree = 4
    12711.368 ms for max_parallel_degree = 8

    Next I see what increasing max_parallel_degree more than 8 is not increasing number of real parallel workers.

    As you advise I change table default parallelism as :

    alter table lineitem1994 set (parallel_degree=16);

    But EXPLAIN ANALYZE for Q1 show what :

    *********
    Workers Planned: 16
    Workers Launched: 8
    *********

    So,no more than 8 workers again.

    Best regards,
    Dmitry

    Reply
    • David Rowley
      David Rowley says:
      May 20, 2016 at 10:21 am

      Remember that the executor will only start at the most max_worker_processes. Setting the relations parallel_degree higher than this does not make sense.

      It looks like it’s scaling better with a smaller table. Seems that 4 workers is about 77% efficient compared to no parallelism, and 8 workers 75%. This not really very close to what I got, so perhaps it’s worth watching top to see if the workers are fully utilising the CPU. If they’re not then you have a bottleneck somewhere else.

      Reply
  4. Hanz
    Hanz says:
    June 28, 2016 at 5:12 am

    Hi David,

    i have tabel and record 1.700.000.000,
    i’m test ruuning in sqlserver need time 17s
    and in postgresql need time 1m.
    why ? help me.

    my COnfig
    max_connections = 100
    shared_buffers = 512MB (Max Windows i’m using Windows)
    effective_cache_size = 3072GB
    work_mem = 1048576kB
    maintenance_work_mem = 1GB
    min_wal_size = 4GB
    max_wal_size = 8GB
    checkpoint_completion_target = 0.9
    wal_buffers = 16MB
    default_statistics_target = 500
    max_parallel_degree = 16
    max_worker_processes = 64

    i’m using explain analyze

    Workers Planned: 8
    Workers Launched: 8

    thx~

    Reply
  5. David Rowley
    David Rowley says:
    June 28, 2016 at 5:34 am

    Hi Hanz,

    It would be difficult to say without more information. The most useful thing you could send would be the EXPLAIN ANALYZE VERBOSE for the query in question.

    Here’s a list of a few things which you might want to consider which might explain the difference:

    1. PostgreSQL might not be fully utilising all of the 8 CPUs. You could check this in Task Manager. This might be easier done on an otherwise idle server. If all 8 processors are not fully utilised then most likely there’s a bottleneck elsewhere, most likely I/O requests can’t be served to PostgreSQL quickly enough.

    2. The table in SQL Server may be smaller. I’m not sure of what sort of table that you’re using in SQL Server, but I believe normal tables have a 9 byte row header, which is quite a bit smaller than PostgreSQL’s 24 bytes. The additional bytes in PostgreSQL can be mostly accounted for by PG’s MVCC implementation which is largely suited for OLTP workloads rather than OLAP type workloads like the type you’re doing. If the SQL Server table is smaller then it could be possible that the SQL Server table sits in memory, and part of the PostgreSQL tables are being read from disk.

    3. The query plan could be quite different between the 2 databases. Even for a query aggregating data on a single table (no joins) there’s still a choice of how the aggregation takes place. If there are joins then these might be different types, or SQL Server may manage to push down the aggregate below the join, which PostgreSQL currently cannot do.

    4. SQL Server could simply be using more worker processes than PostgreSQL.

    There’s many many other possibilities. Post the EXPLAIN ANALYZE VERBOSE, and you can also compare that to SQL Servers plan output and see if it look like it could be executed more efficiently.

    David

    Reply
  6. imachabeli
    imachabeli says:
    July 21, 2016 at 10:32 pm

    “PostgreSQL 9.6beta3, compiled by Visual C++ build 1800, 64-bit”
    I get an error when running pg.
    E:\PG\bin>LOG: unrecognized configuration parameter “max_parallel_degree” in file “e:/pg/data/postgresql.conf” line 653
    FATAL: configuration file “e:/pg/data/postgresql.conf” contains errors

    Reply
    • David Rowley
      David Rowley says:
      July 22, 2016 at 9:24 am

      Hi, This setting has changed name since this blog post was written. I’ve added a small update to it at the bottom to answer your question.

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Latest on PostgreSQL Parallel Query PostgreSQL User Group NL
Scroll to top
×