2ndQuadrant is now part of EDB

Bringing together some of the world's top PostgreSQL experts.

2ndQuadrant | PostgreSQL
Mission Critical Databases
  • Contact us
  • EN
    • FR
    • IT
    • ES
    • DE
    • PT
  • Support & Services
  • Products
  • Downloads
    • Installers
      • Postgres Installer
      • 2UDA – Unified Data Analytics
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
  • Postgres Learning Center
    • Webinars
      • Upcoming Webinars
      • Webinar Library
    • Whitepapers
      • Business Case for PostgreSQL Support
      • Security Best Practices for PostgreSQL
    • Blog
    • Training
      • Course Catalogue
    • Case Studies
      • Performance Tuning
        • BenchPrep
        • tastyworks
      • Distributed Clusters
        • ClickUp
        • European Space Agency (ESA)
        • Telefónica del Sur
        • Animal Logic
      • Database Administration
        • Agilis Systems
      • Professional Training
        • Met Office
        • London & Partners
      • Database Upgrades
        • Alfred Wegener Institute (AWI)
      • Database Migration
        • International Game Technology (IGT)
        • Healthcare Software Solutions (HSS)
        • Navionics
    • Books
      • PostgreSQL 11 Administration Cookbook
      • PostgreSQL 10 Administration Cookbook
      • PostgreSQL High Availability Cookbook – 2nd Edition
      • PostgreSQL 9 Administration Cookbook – 3rd Edition
      • PostgreSQL Server Programming Cookbook – 2nd Edition
      • PostgreSQL 9 Cookbook – Chinese Edition
    • Videos
    • Events
    • PostgreSQL
      • PostgreSQL – History
      • Who uses PostgreSQL?
      • PostgreSQL FAQ
      • PostgreSQL vs MySQL
      • The Business Case for PostgreSQL
      • Security Information
      • Documentation
  • About Us
    • About 2ndQuadrant
    • 2ndQuadrant’s Passion for PostgreSQL
    • News
    • Careers
    • Team Profile
  • Blog
  • Menu Menu
You are here: Home1 / Blog2 / Shaun's PlanetPostgreSQL3 / On Rocks and Sand
Shaun Thomas

On Rocks and Sand

September 21, 2018/10 Comments/in Shaun's PlanetPostgreSQL /by Shaun Thomas

When working with database capacity planning, there are a lot of variables to consider, and Postgres is no different in this regard. One of the elements which requires management is storage. However, there’s an aspect of storage that escapes inspection almost without exception, and it’s hiding in the shadows between the columns themselves.

Alignment Basics

In most low-level computer languages like C, in which the venerable Postgres is written, data types are addressed by their maximum size, regardless of how large they actually are. Thus a standard 32-bit integer which can store a value of just over 2-billion, must be read as a whole unit. This means even the value of 0 requires 4 bytes of storage.

Further, Postgres is designed such that its own internal natural alignment is 8 bytes, meaning consecutive fixed-length columns of differing size must be padded with empty bytes in some cases. We can see that with this example:

SELECT pg_column_size(row()) AS empty,
       pg_column_size(row(0::SMALLINT)) AS byte2,
       pg_column_size(row(0::BIGINT)) AS byte8,
       pg_column_size(row(0::SMALLINT, 0::BIGINT)) AS byte16;

 empty | byte2 | byte8 | byte16 
-------+-------+-------+--------
    24 |    26 |    32 |     40

This suggests that an empty Postgres row requires 24 bytes of various header elements, a SMALLINT is 2 bytes, a BIGINT is 8 bytes, and combining them is… 16 bytes? That’s no mistake; Postgres is padding the smaller column to match the size of the following column for alignment purposes. Instead of 2 + 8 = 10, our math becomes 8 + 8 = 16.

Intensity Intervals

By itself, this may not necessarily be a problem. But consider a contrived ordering system with this table:

CREATE TABLE user_order (
  is_shipped    BOOLEAN NOT NULL DEFAULT false,
  user_id       BIGINT NOT NULL,
  order_total   NUMERIC NOT NULL,
  order_dt      TIMESTAMPTZ NOT NULL,
  order_type    SMALLINT NOT NULL,
  ship_dt       TIMESTAMPTZ,
  item_ct       INT NOT NULL,
  ship_cost     NUMERIC,
  receive_dt    TIMESTAMPTZ,
  tracking_cd   TEXT,
  id            BIGSERIAL PRIMARY KEY NOT NULL
);

If it looks a little weird, that’s because it is. It’s not uncommon for column orders to be dictated by a hurried dev simply jotting down attributes, or an ORM that generates output from an arbitrary hashed key position. Seeing the id column at the end is a good indicator that column order wasn’t part of the architecture or planning roadmap.

Given that, this is what Postgres sees:

SELECT a.attname, t.typname, t.typalign, t.typlen
  FROM pg_class c
  JOIN pg_attribute a ON (a.attrelid = c.oid)
  JOIN pg_type t ON (t.oid = a.atttypid)
 WHERE c.relname = 'user_order'
   AND a.attnum >= 0
 ORDER BY a.attnum;

   attname   |   typname   | typalign | typlen 
-------------+-------------+----------+--------
 is_shipped  | bool        | c        |      1
 user_id     | int8        | d        |      8
 order_total | numeric     | i        |     -1
 order_dt    | timestamptz | d        |      8
 order_type  | int2        | s        |      2
 ship_dt     | timestamptz | d        |      8
 item_ct     | int4        | i        |      4
 ship_cost   | numeric     | i        |     -1
 receive_dt  | timestamptz | d        |      8
 tracking_cd | text        | i        |     -1
 id          | int8        | d        |      8

The typalign column dictates the intended alignment type according to the values listed in the pg_type manual page.

From earlier discussion, we can see that the INT types trivially map to their respective byte sizes. For NUMERIC and TEXT, things are a bit trickier, but we’ll need to address that later. For now, consider the constant size transitions and the effect that may have on alignment.

A Little Padded Room

To avoid this insanity, Postgres pads each smaller column to match the size of the next consecutive column. As a result from this particular column arrangement, there’s almost a slight amount of buffering between every single column pair.

Let’s fill the table with one million rows and check the resulting size:

INSERT INTO user_order (
    is_shipped, user_id, order_total, order_dt, order_type,
    ship_dt, item_ct, ship_cost, receive_dt, tracking_cd
)
SELECT true, 1000, 500.00, now() - INTERVAL '7 days',
       3, now() - INTERVAL '5 days', 10, 4.99,
       now() - INTERVAL '3 days', 'X5901324123479RROIENSTBKCV4'
  FROM generate_series(1, 1000000);

SELECT pg_relation_size('user_order') AS size_bytes,
       pg_size_pretty(pg_relation_size('user_order')) AS size_pretty;

 size_bytes | size_pretty 
------------+-------------
  141246464 | 135 MB

Now we can use that as our benchmark for a value we want to beat. The next question is: how much of that is padding? Well, here’s an estimate of what minimum we’re wasting based on the declared alignment:

  • 7 bytes between is_shipped and user_id
  • 4 bytes between order_total and order_dt
  • 6 bytes between order_type and ship_dt
  • 4 bytes between receive_dt and id

So we’re probably losing around 21 bytes per row, with the declared types accounting for 59 bytes of actual space, for a total row length of 80 bytes without row overhead. Again, this is based only on the alignment storage. It turns out the NUMERIC and TEXT columns screw with the totals a bit more than the alignments suggest, but this will work fine for an estimate.

If we’re even close, that means we may be able to shrink the table by 26%, or about 37MB.

Some Ground Rules

The trick to performing this kind of surgery is obtaining the ideal column alignment that results in the fewest added bytes. To do that, we really need to consider the NUMERIC and TEXT columns. Because these are variable length types, they get special treatment. As an example, consider this regarding NUMERIC:

SELECT pg_column_size(row()) AS empty_row,
       pg_column_size(row(0::NUMERIC)) AS no_val,
       pg_column_size(row(1::NUMERIC)) AS no_dec,
       pg_column_size(row(9.9::NUMERIC)) AS with_dec,
       pg_column_size(row(1::INT2, 1::NUMERIC)) AS col2,
       pg_column_size(row(1::INT4, 1::NUMERIC)) AS col4,
       pg_column_size(row(1::NUMERIC, 1::INT4)) AS round8;

 empty_row | no_val | no_dec | with_dec | col2 | col4 | round8 
-----------+--------+--------+----------+------+------+--------
        24 |     27 |     29 |       31 |   31 |   33 |     36

These results suggest we can treat NUMERIC as unaligned, with some caveats. Even a single digit in a NUMERIC requires 5 bytes, but it also doesn’t impact the previous column the way an INT8 might.

Here’s the same concept with TEXT:

SELECT pg_column_size(row()) AS empty_row,
       pg_column_size(row(''::TEXT)) AS no_text,
       pg_column_size(row('a'::TEXT)) AS min_text,
       pg_column_size(row(1::INT4, 'a'::TEXT)) AS two_col,
       pg_column_size(row('a'::TEXT, 1::INT4)) AS round4;

 empty_row | no_text | min_text | two_col | round4 
-----------+---------+----------+---------+--------
        24 |      25 |       26 |      30 |     32

From this, we can see that variable types round to the nearest 4-bytes based on the type of the next column. This means we can chain variable length columns all day long without introducing padding except at the right boundary. Consequently, we can deduce that variable length columns introduce no bloat so long as they’re at the end of a column listing.

If the constant length column types adjust based on the next column, that would imply the largest types should come first. Barring that, we can “pack” columns such that consecutive columns consume 8 bytes cumulatively.

Again, we can see that in action:

SELECT pg_column_size(row()) AS empty_row,
       pg_column_size(row(1::SMALLINT)) AS int2,
       pg_column_size(row(1::INT)) AS int4,
       pg_column_size(row(1::BIGINT)) AS int8,
       pg_column_size(row(1::SMALLINT, 1::BIGINT)) AS padded,
       pg_column_size(row(1::INT, 1::INT, 1::BIGINT)) AS not_padded;

 empty_row | int2 | int4 | int8 | padded | not_padded 
-----------+------+------+------+--------+------------
        24 |   26 |   28 |   32 |     40 |         40

Neat, eh?

Column Tetris

The previous sections are a fancy way of saying, “Sort the columns by their type length as defined in pg_type.” Luckily enough, we can get that information by slightly adjusting the query we used to output the column types:

SELECT a.attname, t.typname, t.typalign, t.typlen
  FROM pg_class c
  JOIN pg_attribute a ON (a.attrelid = c.oid)
  JOIN pg_type t ON (t.oid = a.atttypid)
 WHERE c.relname = 'user_order'
   AND a.attnum >= 0
 ORDER BY t.typlen DESC;

   attname   |   typname   | typalign | typlen 
-------------+-------------+----------+--------
 id          | int8        | d        |      8
 user_id     | int8        | d        |      8
 order_dt    | timestamptz | d        |      8
 ship_dt     | timestamptz | d        |      8
 receive_dt  | timestamptz | d        |      8
 item_ct     | int4        | i        |      4
 order_type  | int2        | s        |      2
 is_shipped  | bool        | c        |      1
 tracking_cd | text        | i        |     -1
 ship_cost   | numeric     | i        |     -1
 order_total | numeric     | i        |     -1

All things being equal, we can mix and match columns with matching type lengths, and if we’re brave or desire a more pretty column ordering, can combine types of shorter lengths where necessary.

Let’s see what happens if we go with this table design:

DROP TABLE user_order;

CREATE TABLE user_order (
  id            BIGSERIAL PRIMARY KEY NOT NULL,
  user_id       BIGINT NOT NULL,
  order_dt      TIMESTAMPTZ NOT NULL,
  ship_dt       TIMESTAMPTZ,
  receive_dt    TIMESTAMPTZ,
  item_ct       INT NOT NULL,
  order_type    SMALLINT NOT NULL,
  is_shipped    BOOLEAN NOT NULL DEFAULT false,
  order_total   NUMERIC NOT NULL,
  ship_cost     NUMERIC,
  tracking_cd   TEXT
);

If we repeat the previous insert of 1-million rows, the new table size is 117,030,912 bytes, or roughly 112MB. By simply reorganizing the table columns, we’ve saved 21% of the total space.

This may not mean much for a single table, but repeated on every table in a database instance, it could result in a sizeable reduction in storage consumption. In a data warehouse context where data is commonly loaded once and never modified again, a 10-20% storage reduction would be worth considering simply due to the scales involved. I’ve seen 60TB Postgres databases; imagine reducing that by 6-12TB without actually removing any data.

Settled Differences

Much like filling a jar with rocks, pebbles, and sand, the most efficient way to declare a Postgres table is by the column alignment type. Bigger columns first, medium columns next, small columns last, and weird exceptions like NUMERIC and TEXT tacked to the end as if they were dust in our analogy. That’s what we get for playing with pointers.

As it stands, a more “natural” declaration of the table columns is probably along these lines:

CREATE TABLE user_order (
  id            BIGSERIAL PRIMARY KEY NOT NULL,
  user_id       BIGINT NOT NULL,
  order_type    SMALLINT NOT NULL,
  order_total   NUMERIC NOT NULL,
  order_dt      TIMESTAMPTZ NOT NULL,
  item_ct       INT NOT NULL,
  ship_dt       TIMESTAMPTZ,
  is_shipped    BOOLEAN NOT NULL DEFAULT false,
  ship_cost     NUMERIC,
  tracking_cd   TEXT,
  receive_dt    TIMESTAMPTZ
);

In this case, it just so happens we are 8MB away from the ideal, or about 7.7% larger. For the sake of demonstration, we intentionally botched the column ordering to make a point. A real table will probably sit somewhere between the best and worst case scenarios, and the only way to ensure the least amount of leakage is to manually reorder columns.

Some might ask why this isn’t built into Postgres. Surely it knows the ideal column ordering and has the power to decouple a user’s visible mapping from what actually hits the disk. That’s a legitimate question, but it’s a lot more difficult to answer and involves a lot of bike-shedding.

One major benefit from decoupling physical from logical representation is that Postgres would finally allow column reordering, or adding a column in a specific position. If a user wants their column listing to look pretty after multiple modifications, why not let them?

It’s all about priorities. There’s been a TODO item to address this going back to at least 2006. Patches have gone back and forth since then, and every time, the conversation eventually ends without a definitive conclusion. It’s clearly a difficult problem to address, and there are, as they say, bigger fish to fry.

Given sufficient demand, someone will sponsor a patch to completion, even if it requires multiple Postgres versions for the necessary under-the-hood changes to manifest. Until then, a simple query can magically reveal the ideal column ordering if the impact is pressing enough for a particular use case.

To those who love that kind of low-level tweaking, let us know if it was worth the effort. We’d love to hear your stories!

Tags: efficiency, hacking, PG Phriday
Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on LinkedIn
10 replies
  1. Craig Lewis
    Craig Lewis says:
    September 21, 2018 at 11:31 pm

    It seems like a simpler and easier fix would be to add an argument to pg_dump to do optimal reordering. Then the column re-ordering can be tested as part of a dump and restore upgrade.

    Reply
  2. Michal
    Michal says:
    September 26, 2018 at 10:12 pm

    Thanks, this is a very informative article that explains in an easy to understand way how Postgres stores data and fills the storage it with unused bytes. The ability to reorder columns is an important missing feature – after some time of application lifetime and various database modifications the crazy column order you presented in your example may not be far off from the truth. Then working with such table structures in database administration tools becomes clumsy as you end up with related columns far away from one another instead of being grouped logically in close proximity.

    Given the padding and space issues it would be logical to disconnect the internal column order from presentational order. And we should have the ability to:

    1. change presentational column order with ALTER TABLE (without table rewrite)
    2. run a table rewrite that would automatically determine the most efficient internal column order and apply it to storage while preserving the presentational order – something like OPTIMIZE TABLE COLUMN ORDER – or maybe simply do it as part of VACUUM FULL.

    I hope the long standing TODO item will find some love among the developers!

    Reply
  3. Torsten Grust
    Torsten Grust says:
    September 27, 2018 at 6:00 am

    https://stackoverflow.com/questions/2966524/calculating-and-saving-space-in-postgresql/7431468#7431468

    Reply
  4. Tzahi
    Tzahi says:
    September 29, 2018 at 3:35 pm

    Thanks. Very interesting.

    Reply
  5. Martin
    Martin says:
    October 15, 2018 at 9:51 am

    Where should jsonb column be placed, and why?

    Reply
  6. Wutikrai Pornchai
    Wutikrai Pornchai says:
    January 6, 2019 at 4:31 pm

    Thanks very informative post

    Reply
  7. Hieu Vo
    Hieu Vo says:
    January 30, 2019 at 6:25 am

    just wonder if the same optimization could be applied for mysql?

    Reply
  8. Jean Baro
    Jean Baro says:
    August 14, 2019 at 4:35 pm

    Very informative and useful, though it is even harder to keep this strategy as DATABASE evolves (new columns are added).

    Reply
  9. Debasish Mitra
    Debasish Mitra says:
    April 20, 2020 at 11:04 pm

    Would it make difference in ordering of columns, if I know some of the columns will probably have null values in most rows?

    Reply

Trackbacks & Pingbacks

  1. Saving space (basically) for free by re-ordering columns in PostgreSQL | صحافة حرة FREE PRESS says:
    October 5, 2020 at 9:08 pm

    […] technique isn’t revolutionary: it’s been well-documented by 2ndQuadrant in On Rocks and Sand, EDB in Data Alignment in PostgreSQL, GitLab in Ordering Table Columns in PostgreSQL, the classic […]

    Reply

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search

Get in touch with us!

Recent Posts

  • Random Data December 3, 2020
  • Webinar: COMMIT Without Fear – The Beauty of CAMO [Follow Up] November 13, 2020
  • Full-text search since PostgreSQL 8.3 November 5, 2020
  • Random numbers November 3, 2020
  • Webinar: Best Practices for Bulk Data Loading in PostgreSQL [Follow Up] November 2, 2020

Featured External Blogs

Tomas Vondra's Blog

Our Bloggers

  • Simon Riggs
  • Alvaro Herrera
  • Andrew Dunstan
  • Craig Ringer
  • Francesco Canovai
  • Gabriele Bartolini
  • Giulio Calacoci
  • Ian Barwick
  • Marco Nenciarini
  • Mark Wong
  • Pavan Deolasee
  • Petr Jelinek
  • Shaun Thomas
  • Tomas Vondra
  • Umair Shahid

PostgreSQL Cloud

2QLovesPG 2UDA 9.6 backup Barman BDR Business Continuity community conference database DBA development devops disaster recovery greenplum Hot Standby JSON JSONB logical replication monitoring OmniDB open source Orange performance PG12 pgbarman pglogical PG Phriday postgres Postgres-BDR postgres-xl PostgreSQL PostgreSQL 9.6 PostgreSQL10 PostgreSQL11 PostgreSQL 11 PostgreSQL 11 New Features postgresql repmgr Recovery replication security sql wal webinar webinars

Support & Services

24/7 Production Support

Developer Support

Remote DBA for PostgreSQL

PostgreSQL Database Monitoring

PostgreSQL Health Check

PostgreSQL Performance Tuning

Database Security Audit

Upgrade PostgreSQL

PostgreSQL Migration Assessment

Migrate from Oracle to PostgreSQL

Products

HA Postgres Clusters

Postgres-BDR®

2ndQPostgres

pglogical

repmgr

Barman

Postgres Cloud Manager

SQL Firewall

Postgres-XL

OmniDB

Postgres Installer

2UDA

Postgres Learning Center

Introducing Postgres

Blog

Webinars

Books

Videos

Training

Case Studies

Events

About Us

About 2ndQuadrant

What does 2ndQuadrant Mean?

News

Careers 

Team Profile

© 2ndQuadrant Ltd. All rights reserved. | Privacy Policy
  • Twitter
  • LinkedIn
  • Facebook
  • Youtube
  • Mail
Partitioning Improvements in PostgreSQL 11 [Video] Power of Indexing in PostgreSQL
Scroll to top
×