2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / 2ndQuadrant3 / Performance of Sequences and Serials in Postgres-XL
Pavan Deolasee

Performance of Sequences and Serials in Postgres-XL

January 7, 2016/7 Comments/in 2ndQuadrant, Pavan's PlanetPostgreSQL, PostgreSQL /by Pavan Deolasee

In Postgres-XL, sequences are maintained at the Global Transaction Manager (GTM) to ensure that they are assigned non-conflicting values when they are incremented from multiple nodes. This adds significant overhead for a query doing thousands of INSERTs in a table with a serial column, incrementing sequence one at a time and making a network roundtrip to the GTM, for every INSERT.

Shaun Thomas in a recent blog complained about INSERTs running a magnitude slower on Postgres-XL as compared to vanilla PostgreSQL. There is already a way to improve performance for sequences, but it’s clearly not well advertised. I thought this is a good opportunity to explain the facility.

Postgres-XL provides a user-settable GUC called sequence_range. Every backend requests a block of sequence values as controlled by this GUC. Given that COPY is popularly used to bulk load data in Postgres, Postgres-XL automatically overrides this GUC during COPY operation and sets it to 1000, thus dramatically improving COPY performance. Unfortunately, for regular INSERTs, the default is 1 and unless user explicitly sets sequence_range to a reasonably higher value, INSERT performance suffers. Here is an example, using the same sample schema as used by Shaun in his blog post.

CREATE TABLE sensor_log (
  sensor_log_id  SERIAL PRIMARY KEY,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  reading_date   TIMESTAMP NOT NULL
) DISTRIBUTE BY HASH (sensor_log_id);

postgres=# \timing
Timing is on.
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 12067.911 ms

postgres=# set sequence_range TO 1000;
SET
Time: 1.231 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 397.406 ms

So by appropriately setting sequence_range to 1000, performance of the INSERT query improved by nearly 30 times.

When this feature was added, the default value of sequence_range GUC was set to 1 because it can leave holes in the sequence values. But looking at the performance implications for a very common use case, we decided to increase the default to 1000 and this has now been committed to the XL9_5_STABLE branch of the repository.

Its important to note that while a high value of sequence_range will improve performance for sequences and serials, it can also leave large holes in sequence ranges since the sequence ranges are cached at a backend level. To address this issue, Postgres-XL starts with the specified CACHE parameter value used at sequence creation time and doubles it every time (limited by sequence_range) if sequences are being consumed at a very high rate.

Similar improvement can also be achieved by increasing the CACHE parameter value of the sequence so that a chunk of sequence values are cached at the backend level. Following example shows how to do that for a serial column. But the sequence_range GUC provides an easy way to override the global default and also ensures that the sequences are cached only when they are getting incremented very rapidly.

postgres=# ALTER SEQUENCE sensor_log_sensor_log_id_seq CACHE 1000;                                                                                                             ALTER SEQUENCE
Time: 8.683 ms
postgres=# SET sequence_range TO 1;
SET
Time: 2.341 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                            SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                  FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 418.068 ms

You may choose any of these techniques to improve the performance. Though now that the default value of sequence_range is changed to 1000, not many users may see the difference in performance.

Tags: performance, postgres-xl, PostgreSQL, sequence, serial
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
7 replies
  1. Jamey
    Jamey says:
    January 7, 2016 at 1:35 pm

    Is there a reason to choose this technique over UUIDs? I ask because other products with the same requirement to independently generate non-colliding IDs (ex. MongoDB) use UUIDs instead of SERIAL (or equivalent) + logic-to-avoid-collisions. I have recommended UUIDs to customers, but have never done any testing, so I am wondering what your test results were. Thanks!

    Reply
    • Pavan Deolasee
      Pavan Deolasee says:
      January 8, 2016 at 7:19 am

      Since Postgres-XL has a notion of a centralised GTM, it was far easier to leverage that for this purpose. This also allows us to stay as close to PostgreSQL as possible with respect to sequence semantics. But yeah, we had thought about this possibility of partitioning the namespace to avoid collision. Another challenge of course is that XL supports extendable clusters. So when new nodes are added to the cluster, we would need to remap the namespace. Using GTM for tracking sequences helped solve some of these problems.

      Reply
  2. PostgreSuper
    PostgreSuper says:
    January 7, 2016 at 7:38 pm

    How about using multiple GTM server (or multithread)?

    If we use 3 GTM, the sequences would be like this:
    GTM1: 1,4,7,10,13
    GTM2: 2,5,8,11,14
    GTM3: 3,6,9,12,15

    This would increase performance as we need no coordination between sequences.

    Reply
    • PostgreSuper
      PostgreSuper says:
      January 8, 2016 at 5:20 am

      This technique used at galera cluster.

      https://mariadb.org/auto-increments-in-galera/

      Reply
  3. Anonymous
    Anonymous says:
    January 7, 2016 at 11:26 pm

    What about something similar to this?
    http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
    With the shard being some ID that postgres-xl assigns?

    Reply
  4. Shaun Thomas
    Shaun Thomas says:
    January 18, 2016 at 4:37 pm

    Not bad! Good to know a few ways around this. 🙂

    Reply
    • Pavan Deolasee
      Pavan Deolasee says:
      January 19, 2016 at 8:07 am

      Thanks Shaun! With these tweaks, I hope you will find performance much better next time you try 🙂

      Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Why Logical Replication? The Physics of Multi-Master
Scroll to top
×