2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Featured3 / PostgreSQL 9.6: Parallel Sequential Scan
Francesco Canovai

PostgreSQL 9.6: Parallel Sequential Scan

July 11, 2016/1 Comment/in Featured, Francesco's PlanetPostgreSQL, PostgreSQL /by Francesco Canovai

Parallel-Sequential-Scan

For a long time, one of the most known shortcomings of PostgreSQL was the ability to parallelise queries. With the release of version 9.6, this will no longer be an issue. A great job has been done on this subject, starting from the commit 80558c1, the introduction of parallel sequential scan, which we will see in the course of this article.

First, you must take note: the development of this feature has been continuous and some parameters have changed names between a commit and another. This article has been written using a checkout taken on June 17 and some features here illustrated will be present only in the version 9.6 beta2.

Compared to the 9.5 release, new parameters have been introduced inside the configuration file. These are:

  • max_parallel_workers_per_gather: the number of workers that can assist a sequential scan of a table;
  • min_parallel_relation_size: the minimum size that a relation must have for the planner to consider the use of additional workers;
  • parallel_setup_cost: the planner parameter that estimates the cost of instantiate a worker;
  • parallel_tuple_cost: the planner parameter that estimates the cost of transferring a tuple from one worker to another;
  • force_parallel_mode: parameter useful for testing, strong parallelism and also a query in which the planner would operate in other ways.

Let’s see how the additional workers can be used to speed up our queries. We create a test table with an INT field and one hundred million records:

postgres=# CREATE TABLE test (i int);
CREATE TABLE
postgres=# INSERT INTO test SELECT generate_series(1,100000000);
INSERT 0 100000000
postgres=# ANALYSE test;
ANALYZE

PostgreSQL has max_parallel_workers_per_gather set to 2 by default, for which two workers will be activated during a sequential scan.

A simple sequential scan does not present any novelties:

postgres=# EXPLAIN ANALYSE SELECT * FROM test;
                                                       QUERY PLAN                         
------------------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1442478.32 rows=100000032 width=4) (actual time=0.081..21051.918 rows=100000000 loops=1)
 Planning time: 0.077 ms
 Execution time: 28055.993 ms
(3 rows)

In fact, the presence of a WHERE clause is required for parallelisation:

postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..964311.60 rows=1 width=4) (actual time=3.381..9799.942 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..963311.50 rows=0 width=4) (actual time=6525.595..9791.066 rows=0 loops=3)
         Filter: (i = 1)
         Rows Removed by Filter: 33333333
 Planning time: 0.130 ms
 Execution time: 9804.484 ms
(8 rows)

We can go back to the previous action and observe the differences setting max_parallel_workers_per_gather to 0:

postgres=# SET max_parallel_workers_per_gather TO 0;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1;
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1692478.40 rows=1 width=4) (actual time=0.123..25003.221 rows=1 loops=1)
   Filter: (i = 1)
   Rows Removed by Filter: 99999999
 Planning time: 0.105 ms
 Execution time: 25003.263 ms
(5 rows)

A time 2.5 times greater.

The planner does not always consider a parallel sequential scan to be the best option. If a query is not selective enough and there are many tuples to transfer from worker to worker, it may prefer a “classic” sequential scan:

postgres=# SET max_parallel_workers_per_gather TO 2;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i<90000000;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..1692478.40 rows=90116088 width=4) (actual time=0.073..31410.276 rows=89999999 loops=1)
   Filter: (i < 90000000)
   Rows Removed by Filter: 10000001
 Planning time: 0.133 ms
 Execution time: 37939.401 ms
(5 rows)

In fact, if we try to force a parallel sequential scan, we get a worse result:

postgres=# SET parallel_tuple_cost TO 0;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i<90000000;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..964311.50 rows=90116088 width=4) (actual time=0.454..75546.078 rows=89999999 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on test  (cost=0.00..1338795.20 rows=37548370 width=4) (actual time=0.088..20294.670 rows=30000000 loops=3)
         Filter: (i < 90000000)
         Rows Removed by Filter: 3333334
 Planning time: 0.128 ms
 Execution time: 83423.577 ms
(8 rows)

The number of workers can be increased up to max_worker_processes (default: 8). We restore the value of parallel_tuple_cost and we see what happens by increasing max_parallel_workers_per_gather to 8.

postgres=# SET parallel_tuple_cost TO DEFAULT ;
SET
postgres=# SET max_parallel_workers_per_gather TO 8;
SET
postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=1;
                                                       QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..651811.50 rows=1 width=4) (actual time=3.684..8248.307 rows=1 loops=1)
   Workers Planned: 6
   Workers Launched: 6
   ->  Parallel Seq Scan on test  (cost=0.00..650811.40 rows=0 width=4) (actual time=7053.761..8231.174 rows=0 loops=7)
         Filter: (i = 1)
         Rows Removed by Filter: 14285714
 Planning time: 0.124 ms
 Execution time: 8250.461 ms
(8 rows)

Even though PostgreSQL could use up to 8 workers, it has instantiated only six. This is because Postgres also optimises the number of workers according to size of the table and the min_parallel_relation_size. The number of workers made available by postgres is based on a geometric progression with 3 as common ratio 3 and min_parallel_relation_size as scale factor. Here is an example. Considering the 8MB of default parameter:

Size Worker
<8MB 0
<24MB 1
<72MB 2
<216MB 3
<648MB 4
<1944MB 5
<5822MB 6
… …

Our table size is 3458MB, so 6 is the maximum number of available workers.

postgres=# \dt+ test
                    List of relations
 Schema | Name | Type  |  Owner   |  Size   | Description
--------+------+-------+----------+---------+-------------
 public | test | table | postgres | 3458 MB |
(1 row)

Finally, I will give a brief demonstration of the improvements achieved by through this patch. Running our query with a growing number of growing workers, we obtain the following results:

Workers Time
0 24767.848 ms
1 14855.961 ms
2 10415.661 ms
3 8041.187 ms
4 8090.855 ms
5 8082.937 ms
6 8061.939 ms

We can see that the times dramatically improve, until you reach a third of the initial value. It is also simple to explain the fact that we do not see improvements between the use of 3 and 6 workers: the machine on which the test was run has 4 CPUs, so the results are stable after having added 3 more workers to the original process.

Finally, PostgreSQL 9.6 has set the stage for query parallelisation, in which parallel sequential scan is only the first great result. We will also see that in 9.6, aggregations have been parallelised, but that is information for another article which will be released in the upcoming weeks!

Tags: 9.6, parallel sequential scan, parallelismo, performance, PostgreSQL, PostgreSQL 9.6
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
1 reply
  1. David
    David says:
    October 11, 2018 at 6:51 pm

    Per 9.6 documentation:
    “Setting this value to 0, which is the default, disables parallel query execution.”

    So you assumption that the default is set to 2 is probably incorrect unless you are using something different in your conf.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Evolution of Fault Tolerance in PostgreSQL: Replication Phase On the benefits of sorted paths
Scroll to top
×