PostgreSQL 10 identity columns explained
For PostgreSQL 10, I have worked on a feature called “identity columns”. Depesz already wrote a blog post about it and showed that it works pretty much like serial
columns:
CREATE TABLE test_old ( id serial PRIMARY KEY, payload text ); INSERT INTO test_old (payload) VALUES ('a'), ('b'), ('c') RETURNING *;
and
CREATE TABLE test_new ( id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, payload text ); INSERT INTO test_new (payload) VALUES ('a'), ('b'), ('c') RETURNING *;
do pretty much the same thing, except that the new way is more verbose. 😉
So why bother?
Compatibility
The new syntax conforms to the SQL standard. Creating auto-incrementing columns has been a notorious area of incompatibility between different SQL implementations. Some have lately been adopting the standard SQL syntax, however. So now you can move code around between, for example, PostgreSQL, DB2, and Oracle without any change (in this area).
Permissions
A general problem with the old way is that the system doesn’t actually remember that the user typed serial
. It expands this at parse time into something like
CREATE SEQUENCE test_old_id_seq; CREATE TABLE test_old ( id int NOT NULL PRIMARY KEY, payload text ); ALTER TABLE test_old ALTER COLUMN id SET DEFAULT nextval('test_old_id_seq'); ALTER SEQUENCE test_old_id_seq OWNED BY test_old.id;
The OWNED BY
in the last command is an attempt to remember something about the serialness, but it is still insufficient in some cases.
The new way creates the sequence as a proper internal dependency of the table, so that various weird implementation details of the serial pseudotype are not exposed.
One common problem is that permissions for the sequence created by a serial
column need to be managed separately:
CREATE USER foo; GRANT INSERT ON test_old TO foo; GRANT INSERT ON test_new TO foo; SET SESSION AUTHORIZATION foo; INSERT INTO test_old (payload) VALUES ('d'); ERROR: permission denied for sequence test_old_id_seq INSERT INTO test_new (payload) VALUES ('d'); -- OK
You can fix the error by also running
GRANT USAGE ON SEQUENCE test_old_id_seq;
If you have deployment scripts, this is annoying and problematic, because the name of the sequence is automatically generated. Here, of course, it appears in the error message, and it is easy to guess, but sometimes a slightly different name is chosen, and then your deployment scripts will fail.
Managing sequences
You also need to know the name of the sequence if you want to make some changes to the sequence:
ALTER SEQUENCE test_old_id_seq RESTART WITH 1000;
With an identity column, you don’t need to know the name of the sequence:
ALTER TABLE test_new ALTER COLUMN id RESTART WITH 1000;
Schema management
Since serial
is not a real type, it can only be used in certain circumstances. You can specify serial
as a column type when creating a table or when adding a column. But dropping serialness from an existing column or adding it to an existing column is not straightforward.
To drop serialness, you can drop the sequence (again, after ascertaining the name) with the CASCADE
option, which cascades to remove the default value of the associated column:
DROP SEQUENCE test_old_id_seq CASCADE;
If you instead drop the default value like
ALTER TABLE test_old ALTER COLUMN id DROP DEFAULT;
it will drop the default but leave the sequence in place.
If you want to take an existing integer column and turn it into a serial
column, there is no single command to do that. You will have to manually assemble the CREATE SEQUENCE
and ALTER TABLE ... SET DEFAULT
commands shown earlier.
Dropping the identity property of an existing column is easy:
ALTER TABLE test_new ALTER COLUMN id DROP IDENTITY;
You cannot accidentally make a mistake and drop a default, because there is none. But just in case, you get a nice error message:
=> ALTER TABLE test_new ALTER COLUMN id DROP DEFAULT; ERROR: column "id" of relation "test_new" is an identity column HINT: Use ALTER TABLE ... ALTER COLUMN ... DROP IDENTITY instead.
You can also turn an existing integer column into an identity column with one command:
ALTER TABLE test_new ALTER COLUMN id ADD GENERATED BY DEFAULT AS IDENTITY;
Copying table structures
If you use the CREATE TABLE / LIKE
functionality to copy the structure of a table, serial
columns pose a problem:
CREATE TABLE test_old2 (LIKE test_old INCLUDING ALL); INSERT INTO test_old2 (payload) VALUES ('e') RETURNING *; id | payload ----+--------- 4 | e
Note that even though the new table is a separate table, it keeps using the old sequence.
This gets even more confusing when you want to drop the first table:
=> DROP TABLE test_old; ERROR: cannot drop table test_old because other objects depend on it DETAIL: default for table test_old2 column id depends on sequence test_old_id_seq HINT: Use DROP ... CASCADE to drop the dependent objects too.
(You can use CASCADE
as hinted, or drop test_old2
first. The latter works without CASCADE
because the sequence is still linked to the first table.)
When you copy a table with an identity column in this way, you get a new sequence:
CREATE TABLE test_new2 (LIKE test_new INCLUDING ALL); INSERT INTO test_new2 (payload) VALUES ('e') RETURNING *; id | payload ----+--------- 1 | e
Upgrading
Perhaps you are convinced and you want to “upgrade” all your messy serial columns to this new identity column thing. (Note that you don’t have to “upgrade”. You can keep using serial columns the same way as before.) Here is a PL/pgSQL function that you can use:
CREATE OR REPLACE FUNCTION upgrade_serial_to_identity(tbl regclass, col name) RETURNS void LANGUAGE plpgsql AS $$ DECLARE colnum smallint; seqid oid; count int; BEGIN -- find column number SELECT attnum INTO colnum FROM pg_attribute WHERE attrelid = tbl AND attname = col; IF NOT FOUND THEN RAISE EXCEPTION 'column does not exist'; END IF; -- find sequence SELECT INTO seqid objid FROM pg_depend WHERE (refclassid, refobjid, refobjsubid) = ('pg_class'::regclass, tbl, colnum) AND classid = 'pg_class'::regclass AND objsubid = 0 AND deptype = 'a'; GET DIAGNOSTICS count = ROW_COUNT; IF count < 1 THEN RAISE EXCEPTION 'no linked sequence found'; ELSIF count > 1 THEN RAISE EXCEPTION 'more than one linked sequence found'; END IF; -- drop the default EXECUTE 'ALTER TABLE ' || tbl || ' ALTER COLUMN ' || quote_ident(col) || ' DROP DEFAULT'; -- change the dependency between column and sequence to internal UPDATE pg_depend SET deptype = 'i' WHERE (classid, objid, objsubid) = ('pg_class'::regclass, seqid, 0) AND deptype = 'a'; -- mark the column as identity column UPDATE pg_attribute SET attidentity = 'd' WHERE attrelid = tbl AND attname = col; END; $$;
Call it like this:
SELECT upgrade_serial_to_identity('test_old', 'id');
If you are using schemas, it would look like this:
SELECT upgrade_serial_to_identity('public.test_old', 'id');
Check your table definition before and after:
=> \d test_old Table "public.test_old" Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+-------------------------------------- id | integer | | not null | nextval('test_old_id_seq'::regclass) payload | text | | |
=> \d test_old Table "public.test_old" Column | Type | Collation | Nullable | Default ---------+---------+-----------+----------+---------------------------------- id | integer | | not null | generated by default as identity payload | text | | |
Further reading
More information can be found in the PostgreSQL documentation, starting from the CREATE TABLE reference page.
How about inherited tables?
The identity property is not inherited. For a serial column, the default expression is inherited but the sequence ownership is not (similar to the
LIKE
case).Wow, when you see all the weirdness of SERIAL in one place, it is obvious we needed something better.
Great news! Thanks for your work.
One question: Experimenting with the current beta of Postgres 10 shows that `OWNED BY NONE` is allowed as an option to GENERATED…AS IDENTITY. Does that make sense? Seems like the goal of an identity column is to hide the implementation details of the sequence object… so shouldn’t ownership be irrelevant, and therefore `OWNED BY …` and `OWNED BY` should both throw an error?
Example of code that succeeds in beta:
CREATE TABLE IF NOT EXISTS tbl_ (
id_ int GENERATED BY DEFAULT AS IDENTITY ( START WITH 200 MINVALUE 100 MAXVALUE 205 CYCLE INCREMENT BY 3 OWNED BY NONE ) PRIMARY KEY ,
name_ text
);
That appears to be a bug. I’ll look into it.
What is the relationship between GENERATED … AS IDENTITY and PRIMARY KEY? Do we still need to specify PRIMARY KEY to get an index, UNIQUE, and designation for relationships (with foreign keys and JOIN)?
Identity and primary key are separate. So you have to specify both if you want both. An identity column is, however, automatically NOT NULL. (That is per SQL standard.)
I suggest adding a mention of `GENERATED … AS IDENTITY` as a replacement for `SERIAL` in the “Serial Types” section 8.1.4 of the “Data Types” page. Seems a shame to let the naïve reader continue using `SERIAL` without knowing about identity column feature.
https://www.postgresql.org/docs/devel/static/datatype-numeric.html#datatype-serial
I posted this suggestion to the Postgres docs mailing list:
https://www.postgresql.org/message-id/7637CE20-AA33-48C4-AEBA-A0F591F87235%40pobox.com
I have just upgraded to this version and have been testing the IDENTITY column, and I have encountered a problem with pgAdmin. After I have created the table and view its CREATE TABLE script there are no references to the IDENTITY clause or to the related sequence. This means that I cannot copy that script into a file so that I can create the same table in a different environment.
Maybe try https://omnidb.org
To finish the story, the following expression:
currval(pg_get_serial_sequence(‘sometable’, ‘id’));
… can be used to get the newly generated IDENTITY value, just like before with the SERIAL pseudo type.
I have a question about “identity inserts.” If I’m using a SERIAL, I can make a call to SETVAL() like below after explicitly specifying the value for an id.
INSERT INTO test_old (payload) VALUES (‘a’), (‘b’), (‘c’) RETURNING *;
INSERT INTO test_old (id, payload) VALUES (5, ‘a’) RETURNING *;
SELECT SETVAL(‘test_old_id_seq’, 5);
INSERT INTO test_old (payload) VALUES (‘a’), (‘b’), (‘c’) RETURNING *;
What is the proper way of doing this using an IDENTITY column? Do I need to do an ALTER TABLE like the following?
ALTER TABLE test_new ALTER COLUMN id RESTART WITH 5;
I’m wondering if there is a performance hit for having to do an ALTER TABLE versus a SETVAL()?
I have code where a user is passing in data and depending on whether the id value is specified, I optionally call SETVAL() after doing the INSERT. I found that you have to do this. Otherwise, the next value is still whatever it was before. I haven’t noticed performance problems doing this. Having to do an ALTER TABLE is a little scary. Or, is there some other way using SETVAL()?
There is a difference. ALTER TABLE is transactional, setval is not. So it depends on what you want. See the ALTER SEQUENCE man page for details.
Never mind, I just noticed that it appears that it still created a separate sequence for the IDENTITY column. So, I can still do SETVAL() the way I was before.
I thanks for this great article.
I wanted to convert all my serial, but unfortunatly I’m using AWS RDS Postgres and there is no way to “UPDATE pg_depend” there.
Any work arround that you could think about ?
I’m surprised that UPDATE pg_depend is not allowed, but if so, then I don’t know of another way.
> One common problem is that permissions for the sequence created by a serial column need to be managed separately
Maybe I misunderstand but I think they still do, at least in this situation: if you run ‘pg_dump –data-only’ as a user with SELECT on the table, the dump fails with “permission denied for sequence”. Granting SELECT on the sequence fixes it. Not a big problem. I do like the extra safety of GENERATED ALWAYS.
Thanks
Good point. The way pg_dump accesses a sequence requires separate permissions. What the article was referring to was that you don’t need separate sequence permissions just for inserting a default value into the table.
Hi Peter,
Thank you for the explanation. I’ve just started to move from sequences to identities.
I also use a lot of uuid columns as keys. I didn’t found out a way to use IDENTITY columns with a function that generates the value, like uuid_generate_v1mc().
For those cases, I still use something like: identificador uuid NOT NULL DEFAULT uuid_generate_v1mc() PRIMARY KEY
IDENTITY uses just sequences behind, right? My guess is that this is no way to use uuid generators as a sequence generator.
Thanks!
Correct, the identity facility is only available for increasing integers backed by sequences. For uuids you need to use default values, like you describe. I have been thinking about ways to consolidate these two approaches, but it hasn’t lead to anything yet.
if I am using JPA, should i use identity column or sequence ? is there any benefits of using identity over sequence ? for sequence i know that JPA support batching
The benefits will be on the server-side, in schema management, as described in the article. How this affects client-side frameworks depends, but I don’t expect significant differences.
Thanks a lot !