Adding new table columns with default values in PostgreSQL 11
In PostgreSQL version 10 or less, if you add a new column to a table without specifying a default value then no change is made to the actual values stored. Any existing row will just fill in a NULL for that column. But if you specify a default value, the entire table gets rewritten with the default value filled in on every row. That rewriting behavior changes in PostgreSQL 11.
In a new feature I worked on and committed, the default value is just stored in the catalog and used where needed in rows existing at the time the change was made. New rows, and new versions of existing rows, are written with the default value in place, as happens now. Any row that doesn’t have that column must have existed before the table change was made, and uses this value stored in the catalog when the row is fetched. The great advantage of this is that adding a column with a default value is now quite a fast and cheap operation, whereas before for very large tables it has been horribly, often intolerably slow. Rewriting a whole multi-terabyte table is really something you want to avoid.
The default value doesn’t have to be a static expression . It can be any non-volatile expression, e.g. CURRENT_TIMESTAMP
. Volatile expressions such as random()
will still result in table rewrites. For a non-volatile expression, it is evaluated at the time the statement runs and the result is what is stored in the catalog for use with existing rows. Of course, for new rows the expression will be re-evaluated at the time the row is created, as happens now.
Any time that the table does get rewritten, say by VACUUM FULL
, all this is cleaned up, as there will be no rows left needing the value from the catalog. Likewise, any time that a row is updated the column’s value is filled in on the new row version, so over time the proportion of rows using the value fetched from the catalog will decrease.
Was there any discussion on allowing for non-storing default values for new and updated rows also?
There are often quite a few columns on the database with over 90% of values equal for to some static value. For example issue_status=’CLOSED’, invoice_currency=’USD’, address_country=’USA’ etc. With this functionality the requirement for storing the value for 90% of rows disappeared.
Would it be possible to add the functionality to:
– on row update which does not update the column, to leave it not-stored?
– on row update which sets the column value to DEFAULT to not-store it?
– on new row creation which sets the column value to DEFAULT to not-store it?
This has a potential for significantly decreasing storage size requirements for some tables.
Doesn’t that make `CURRENT_TIMESTAMP` an unreliable default? For example, if I have a table:
“`
CREATE TABLE logs (
id BIGINT
created_at TIMESTAMP
);
“`
and I insert a bunch of rows with `created_at = NULL`; then I change the default to be `CURRENT_TIMESTAMP`, all my rows will appear to have been created at (I’m a little confused here) either a) when I ran `ALTER TABLE logs ALTER COLUMN created_at DEFAULT CURRENT_TIMESTAMP;` or b) every time I run `SELECT * FROM logs;`.
To the outside viewer (unaware of my column default change), it would appear that the `created_at` is always inaccurate/changing.
1. The stored default value doesn’t change. It is evaluated from the default expression at the time of the
ALTER TABLE
statement, and that is the value that is stored. 2. This ONLY applies where you add a new column. Then the stored value is used for rows that exist at that time, where the column doesn’t exist at all. Any new rows will get the default expression as evaluated at the time they are created, just as they did before this change. 3. Since your example doesn’t involve adding a new column to an existing table, it will be completely unaffected.The most common use of this feature is with a completely static expression, e,g,
ALTER TABLE foo ADD COLUMN bar integer NOT NULL DEFAULT 0
;