Performance of Sequences and Serials in Postgres-XL

In Postgres-XL, sequences are maintained at the Global Transaction Manager (GTM) to ensure that they are assigned non-conflicting values when they are incremented from multiple nodes. This adds significant overhead for a query doing thousands of INSERTs in a table with a serial column, incrementing sequence one at a time and making a network roundtrip to the GTM, for every INSERT.

Shaun Thomas in a recent blog complained about INSERTs running a magnitude slower on Postgres-XL as compared to vanilla PostgreSQL. There is already a way to improve performance for sequences, but it’s clearly not well advertised. I thought this is a good opportunity to explain the facility.

Postgres-XL provides a user-settable GUC called sequence_range. Every backend requests a block of sequence values as controlled by this GUC. Given that COPY is popularly used to bulk load data in Postgres, Postgres-XL automatically overrides this GUC during COPY operation and sets it to 1000, thus dramatically improving COPY performance. Unfortunately, for regular INSERTs, the default is 1 and unless user explicitly sets sequence_range to a reasonably higher value, INSERT performance suffers. Here is an example, using the same sample schema as used by Shaun in his blog post.

CREATE TABLE sensor_log (
  sensor_log_id  SERIAL PRIMARY KEY,
  location       VARCHAR NOT NULL,
  reading        BIGINT NOT NULL,
  reading_date   TIMESTAMP NOT NULL
) DISTRIBUTE BY HASH (sensor_log_id);

postgres=# \timing
Timing is on.
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 12067.911 ms

postgres=# set sequence_range TO 1000;
SET
Time: 1.231 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                                         SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                    FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 397.406 ms

So by appropriately setting sequence_range to 1000, performance of the INSERT query improved by nearly 30 times.

When this feature was added, the default value of sequence_range GUC was set to 1 because it can leave holes in the sequence values. But looking at the performance implications for a very common use case, we decided to increase the default to 1000 and this has now been committed to the XL9_5_STABLE branch of the repository.

Its important to note that while a high value of sequence_range will improve performance for sequences and serials, it can also leave large holes in sequence ranges since the sequence ranges are cached at a backend level. To address this issue, Postgres-XL starts with the specified CACHE parameter value used at sequence creation time and doubles it every time (limited by sequence_range) if sequences are being consumed at a very high rate.

Similar improvement can also be achieved by increasing the CACHE parameter value of the sequence so that a chunk of sequence values are cached at the backend level. Following example shows how to do that for a serial column. But the sequence_range GUC provides an easy way to override the global default and also ensures that the sequences are cached only when they are getting incremented very rapidly.

postgres=# ALTER SEQUENCE sensor_log_sensor_log_id_seq CACHE 1000;                                                                                                             ALTER SEQUENCE
Time: 8.683 ms
postgres=# SET sequence_range TO 1;
SET
Time: 2.341 ms
postgres=# INSERT INTO sensor_log (location, reading, reading_date)                                                                                                            SELECT s.id % 1000, s.id % 100, now() - (s.id || 's')::INTERVAL                                                                                                                  FROM generate_series(1, 40000) s(id);
INSERT 0 40000
Time: 418.068 ms

You may choose any of these techniques to improve the performance. Though now that the default value of sequence_range is changed to 1000, not many users may see the difference in performance.

7 replies
  1. Jamey
    Jamey says:

    Is there a reason to choose this technique over UUIDs? I ask because other products with the same requirement to independently generate non-colliding IDs (ex. MongoDB) use UUIDs instead of SERIAL (or equivalent) + logic-to-avoid-collisions. I have recommended UUIDs to customers, but have never done any testing, so I am wondering what your test results were. Thanks!

    Reply
    • Pavan Deolasee
      Pavan Deolasee says:

      Since Postgres-XL has a notion of a centralised GTM, it was far easier to leverage that for this purpose. This also allows us to stay as close to PostgreSQL as possible with respect to sequence semantics. But yeah, we had thought about this possibility of partitioning the namespace to avoid collision. Another challenge of course is that XL supports extendable clusters. So when new nodes are added to the cluster, we would need to remap the namespace. Using GTM for tracking sequences helped solve some of these problems.

      Reply
  2. PostgreSuper
    PostgreSuper says:

    How about using multiple GTM server (or multithread)?

    If we use 3 GTM, the sequences would be like this:
    GTM1: 1,4,7,10,13
    GTM2: 2,5,8,11,14
    GTM3: 3,6,9,12,15

    This would increase performance as we need no coordination between sequences.

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *