Generated columns in PostgreSQL 12
The data warehousing community will be happy to know that PostgreSQL now has a feature to generate columns based on data in other columns.
This feature is known in various other DBMS as “calculated columns”, “virtual columns”, or “generated columns”. PostgreSQL picked the moniker “generated”. We’ll go with that for now, and discuss later why that may not have been the most future-proof naming.
This feature allows PostgreSQL to calculate columns based on given input columns.
In the current implementation, the values are generated at INSERT/UPDATE time, and stored physically with the rest of the table data. A keyword is used to indicate that the values are computed and stored.
Let’s look at a simple example:
CREATE OR REPLACE FUNCTION my_concat(text, VARIADIC text[]) RETURNS TEXT AS 'text_concat_ws' LANGUAGE internal immutable; CREATE TABLE addresses ( id bigserial primary key, address1 text, address2 text, address3 text, city text, state text, zip text, delivery_address text generated always as (my_concat(E'\n',address1,address2,address3,my_concat(' ',my_concat(',', city, state),zip)) ) stored ); INSERT INTO addresses (address1,address3,city,state,zip) VALUES ('105 Live Oak Street','c/o Somebody, Somewhere','Live Oak Village','TX','78039'); SELECT delivery_address FROM addresses;
And the result is:
delivery_address 105 Live Oak Street c/o Somebody, Somewhere Live Oak Village,TX 78039 (1 row)
In this case, we are simply using the generated column feature to provide a mailing address that we won’t have to repeatedly compute at some later date. The more often the value would otherwise be required to be re-computed, the more valuable the generated column becomes.
If you examine the output carefully, you’ll notice that address line 2 is missing, and yet the format of the delivery address is intact.
This feature is particularly helpful with JSON/XML extraction, GIS data, full-text search and other highly repetitive operations.
Well, Ok, that’s a bit helpful with formatted text, but what’s the meaning of this function “my_concat”? It turns out that the text functions provided by PostgreSQL are considered volatile (produce different outputs, even with the same inputs) when they are locale specific. Anything that depends on the code page or collation sequence could potentially change in flight if the locale settings change.
This is very annoying, since nearly every text function is locale dependent. So, in order to use this feature, we have to redeclare the concat_ws function to my_concat function, just so that we can pretend that it is immutable. Otherwise, we get the error:
ERROR: generation expression is not immutable
Let’s try a bit more complex example. A common use for generated columns in data warehouse applications is to compute dates and date ranges for reporting. This example is fairly long, so I’ll post it in it’s entirety at the end of the article. If you want to see the items I’m calling out in context, skip to the end.
CREATE TABLE public.media_calendar ( gregorian date NOT NULL PRIMARY KEY, month_int integer GENERATED ALWAYS AS (date_part('month'::text, gregorian)) STORED, day_int integer GENERATED ALWAYS AS (date_part('day'::text, gregorian)) STORED, year_int integer GENERATED ALWAYS AS (date_part('year'::text, gregorian)) STORED, quarter_int integer GENERATED ALWAYS AS (date_part('quarter'::text, gregorian)) STORED, dow_int integer GENERATED ALWAYS AS (date_part('dow'::text, gregorian)) STORED,
This snippet shows some of the basic extractions that are possible without having to resort to heroics. It is very common to dissect a date in this way, and then use the values for partitioning, reporting, sorting, and financial calculations.
That being said, the generated columns are specifically not allowed for partitioning. You would have to build an update statement that does a lookup against the media calendar, set all of the values explicitly, and use those for the partition. This greatly reduces the value of the media calendar for partitioning relations.
Many more extractions are possible, some of which are shown later in the example.
The syntax of the SQL always follows this same pattern. At the time of this writing, there are no other options than “GENERATED ALWAYS” and “STORED”. Presumably, “VIRTUAL” or some such keyword will be introduced in the future for columns that are calculated at SELECT time. This makes the the moniker “generated” a bit dubious. I think for English grammar reasons I’d rather have named this feature “calculated”.
It is not possible to build generated columns on the basis of other generated columns. So, in our earlier example, it would be nice to build a “year-month” column based on the derived year concatenated with the derived month. Unfortunately, both extractions would have to be performed again in the year_month column.
Generated columns cannot be directly updated, either.
UPDATE media_calendar SET boq = '2019-01-01' WHERE year = 2019 and month BETWEEN 1 AND 3;
psql ERROR: column boq can only be updated to DEFAULT DETAIL: Column "boq" is a generated column.
INDEXES however, are perfectly fine on generated columns. In fact, they are possibly one of the biggest advantages.
CREATE INDEX media_year ON media_calendar (year_int); CREATE INDEX
Here follows some hopefully useful code for your enjoyment. It is released to you under the PostgreSQL license.
-- -- Name: boq(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.boq(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ SELECT (date_part('year', thedate) || CASE date_part('quarter',thedate) WHEN 1 THEN '-01-01' WHEN 2 THEN '-04-01' WHEN 3 THEN '-07-01' WHEN 4 THEN '-10-01' END)::date; $$; COMMENT ON FUNCTION public.boq(date) IS 'Beginning of the quarter using month boundaries.'; -- -- Name: month_english(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.month_english(thedate date) RETURNS text LANGUAGE sql IMMUTABLE AS $$ SELECT CASE to_char(thedate, 'MM') WHEN '01' THEN 'January' WHEN '02' THEN 'February' WHEN '03' THEN 'March' WHEN '04' THEN 'April' WHEN '05' THEN 'May' WHEN '06' THEN 'June' WHEN '07' THEN 'July' WHEN '08' THEN 'August' WHEN '09' THEN 'September' WHEN '10' THEN 'October' WHEN '11' THEN 'November' WHEN '12' THEN 'December' ELSE to_char(thedate, 'MM') END; $$; COMMENT ON FUNCTION public.month_english(date) IS 'Explicit conversion to English, not using the to_char() with locale.'; -- -- Name: nth_name(integer); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.nth_name(ordinal integer) RETURNS text LANGUAGE sql IMMUTABLE AS $$ SELECT ordinal::text || CASE right(ordinal::text,1)::integer WHEN 1 THEN 'st' WHEN 2 THEN 'nd' WHEN 3 THEN 'rd' ELSE 'th' END; $$; COMMENT ON FUNCTION public.nth_name(integer) IS 'Append the appropriate rank name abbreviation to the given ordinal. 1st, 2nd, 3rd...'; -- -- Name: boy_date(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.boy_date(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ SELECT (date_part('year', thedate) || '-01-01')::date; $$; COMMENT ON FUNCTION public.boy_date(date) IS 'Beginning of calendar year date.'; -- -- Name: dow_english(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.dow_english(thedate date) RETURNS text LANGUAGE sql IMMUTABLE AS $$ SELECT CASE date_part('dow',thedate) WHEN 0 THEN 'Sunday' WHEN 1 THEN 'Monday' WHEN 2 THEN 'Tuesday' WHEN 3 THEN 'Wednesday' WHEN 4 THEN 'Thursday' WHEN 5 THEN 'Friday' WHEN 6 THEN 'Saturday' ELSE 'Unknown' END; $$; COMMENT ON FUNCTION public.dow_english(date) IS 'Day of week in the English language. Specifically does not use to_char(), as that is locale specific.'; -- -- Name: dow_locale(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.dow_locale(thedate date) RETURNS text LANGUAGE sql IMMUTABLE AS $$ SELECT to_char(thedate,'Day'); $$; COMMENT ON FUNCTION public.dow_locale(date) IS 'Day of week in the installation specific language, taken from the local encoding.'; -- -- Name: end_of_month(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.end_of_month(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- subtract to get the first day of the month. Add a month, then subtract a day. SELECT ((thedate - (date_part('day', thedate) - 1)::integer)::date + interval '1 month' - interval '1 day')::date; $$; COMMENT ON FUNCTION public.end_of_month(date) IS 'The date of the last day of the month.'; -- -- Name: end_of_quarter(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.end_of_quarter(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- get the beginning of quarter, add 3 months, and subtract a day. SELECT (boq(thedate) + interval '3 months' - interval '1 day')::date; $$; COMMENT ON FUNCTION public.end_of_quarter(date) IS 'The date of the last day of the calendar quarter.'; -- -- Name: end_of_week(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.end_of_week(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- Days are numbered 0-6 Sun-Sat. -- Start with Saturday, subtract today's Day ordinal, -- add that to the original date to get to the end of the week SELECT thedate + (6-date_part('dow', thedate)::integer); $$; COMMENT ON FUNCTION public.end_of_week(date) IS 'The date of the last day of the week.'; -- -- Name: end_of_year(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.end_of_year(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- truncate the month and day, add a year, then subtract a day SELECT (((date_part('year',thedate)+1) || '-01-01')::date -1)::date; $$; COMMENT ON FUNCTION public.end_of_year (date) IS 'The date of the last calendar day of the year.'; -- -- Name: full_date_english(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.full_date_english(thedate date) RETURNS text LANGUAGE sql IMMUTABLE AS $$ -- This should probably be done with a language parsing engine, but this is quick and easy. SELECT dow_english(thedate) || ', the ' || nth_name(date_part('day', thedate)::integer) || ' day of ' || month_english(thedate) || ', ' || date_part('year', thedate); $$; COMMENT ON FUNCTION public.full_date_english (date) IS 'The formally written datum of the given date in English.'; -- -- Name: is_leap_year(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.is_leap_year(thedate date) RETURNS boolean LANGUAGE sql IMMUTABLE AS $$ -- start with Mar 1, subtract a day, and see if we got a leap day. SELECT date_part('day', (date_part('year', thedate) || '-03-01')::date -1) = 29; $$; COMMENT ON FUNCTION public.is_leap_year(date) IS 'Delegate the calculation of leap year to the PostgreSQL engine by moving to March 1 and subtracting a day.'; -- -- Name: julian_day(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.julian_day(thedate date) RETURNS integer LANGUAGE sql IMMUTABLE AS $$ -- Don't have any idea who would ever use this, but it was easy with to_char(). SELECT to_char(thedate, 'J')::integer; $$; COMMENT ON FUNCTION public.julian_day(date) IS 'Ordinal day on the Julian calendar since November 24, 4714 BC at midnight UTC'; -- -- Name: month_local(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.month_local(thedate date) RETURNS text LANGUAGE sql IMMUTABLE AS $$ -- This only exists because to_char() is locale specific, so VOLATILE. Generated columns can't come from VOLATILE functions. SELECT to_char(thedate, 'Month'); $$; COMMENT ON FUNCTION public.month_local(date) IS 'Conversion of the month ordinal to plain language by locale settings'; -- -- Name: next_business_day(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.next_business_day(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- Find a date in the near future that is a weekday. It won't be more than 3 days away... SELECT thedate + x FROM generate_series(1,4) x WHERE CASE WHEN date_part('dow', thedate +x) BETWEEN 1 AND 5 THEN TRUE ELSE FALSE END ORDER BY x LIMIT 1; $$; COMMENT ON FUNCTION public.next_business_day(date) IS 'Next weekday date.'; -- -- Name: previous_business_day(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.previous_business_day(thedate date) RETURNS date LANGUAGE sql IMMUTABLE AS $$ -- Select a date in the recent past that was a weekday SELECT thedate - x FROM generate_series(1,4) x WHERE CASE WHEN date_part('dow', thedate -x) BETWEEN 1 AND 5 THEN TRUE ELSE FALSE END ORDER BY x LIMIT 1; $$; COMMENT ON FUNCTION public.previous_business_day(date) IS 'Previous weekday date.'; -- -- Name: week_of_month(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.week_of_month(thedate date) RETURNS integer LANGUAGE sql IMMUTABLE AS $$ -- Count of weeks starting with the first in week 1. SELECT to_char(thedate, 'W')::integer; $$; COMMENT ON FUNCTION public.week_of_month(date) IS 'Week of month starting with the first day in the first week.'; -- -- Name: week_of_year(date); Type: FUNCTION; Schema: public -- CREATE FUNCTION public.week_of_year(thedate date) RETURNS integer LANGUAGE sql IMMUTABLE AS $$ -- Exists because to_char() is VOLATILE SELECT to_char(thedate, 'WW')::integer; $$; COMMENT ON FUNCTION public.week_of_year(date) IS 'Week of year calculated with the first of the year in the first week.'; CREATE OR REPLACE FUNCTION public.is_weekday(thedate date) RETURNS boolean AS $$ -- Weekdays are numbered 0-6 Sun-Sat. SELECT CASE WHEN date_part('dow'::text, thedate) BETWEEN 1 AND 5 THEN true ELSE false END $$ LANGUAGE SQL IMMUTABLE; -- -- Name: media_calendar; Type: TABLE; Schema: public -- CREATE TABLE public.media_calendar ( gregorian date NOT NULL PRIMARY KEY, month_int integer GENERATED ALWAYS AS (date_part('month'::text, gregorian)) STORED, day_int integer GENERATED ALWAYS AS (date_part('day'::text, gregorian)) STORED, year_int integer GENERATED ALWAYS AS (date_part('year'::text, gregorian)) STORED, quarter_int integer GENERATED ALWAYS AS (date_part('quarter'::text, gregorian)) STORED, dow_int integer GENERATED ALWAYS AS (date_part('dow'::text, gregorian)) STORED, day_local character(20) GENERATED ALWAYS AS (public.dow_locale(gregorian)) STORED, doy_int integer GENERATED ALWAYS AS (date_part('doy'::text, gregorian)) STORED, boq_int date GENERATED ALWAYS AS (public.boq(gregorian)) STORED, doq_int integer GENERATED ALWAYS AS (((gregorian - public.boq(gregorian)) + 1)) STORED, day_english character(20) GENERATED ALWAYS AS (public.dow_english(gregorian)) STORED, month_local character(20) GENERATED ALWAYS AS (public.month_local(gregorian)) STORED, month_english character(20) GENERATED ALWAYS AS (public.month_english(gregorian)) STORED, week_of_month integer GENERATED ALWAYS AS (public.week_of_month(gregorian)) STORED, week_of_year integer GENERATED ALWAYS AS (public.week_of_year(gregorian)) STORED, julian_day integer GENERATED ALWAYS AS (public.julian_day(gregorian)) STORED, is_weekday boolean GENERATED ALWAYS AS (public.is_weekday(gregorian)) STORED, nth_day character(10) GENERATED ALWAYS AS (public.nth_name((date_part('day'::text, gregorian))::integer)) STORED, full_english text GENERATED ALWAYS AS (public.full_date_english(gregorian)) STORED, first_day_of_week date GENERATED ALWAYS AS ((gregorian - (date_part('dow'::text, gregorian))::integer)) STORED, first_day_of_month date GENERATED ALWAYS AS ((gregorian - ((date_part('day'::text, gregorian))::integer - 1))) STORED, first_day_of_year date GENERATED ALWAYS AS (public.boy_date(gregorian)) STORED, previous_business_day date GENERATED ALWAYS AS (public.previous_business_day(gregorian)) STORED, next_business_day date GENERATED ALWAYS AS (public.next_business_day(gregorian)) STORED, is_leap_year boolean GENERATED ALWAYS AS (public.is_leap_year(gregorian)) STORED, end_of_month date GENERATED ALWAYS AS (public.end_of_month(gregorian)) STORED, end_of_week date GENERATED ALWAYS AS (public.end_of_week(gregorian)) STORED, end_of_year date GENERATED ALWAYS AS (public.end_of_year(gregorian)) STORED, end_of_quarter date GENERATED ALWAYS AS (public.end_of_quarter(gregorian)) STORED, dow_in_month integer GENERATED ALWAYS AS (((gregorian - (gregorian - (date_part('day'::text, gregorian)::integer )) + 6) / 7)::integer) STORED, dow_in_year integer GENERATED ALWAYS AS (((gregorian - (boy_date(gregorian) -1) + 6) / 7)::integer) STORED, dow_in_quarter integer GENERATED ALWAYS AS (((((gregorian - public.boq(gregorian)) + 1) + 6) / 7)) STORED ); COMMENT ON COLUMN public.media_calendar.gregorian IS 'The only physical column in the table, and the primary key. All others columns are generated.'; COMMENT ON COLUMN public.media_calendar.month_int IS 'The month of the year counting from January as 1.'; COMMENT ON COLUMN public.media_calendar.day_int IS 'Ordinal day of month.'; COMMENT ON COLUMN public.media_calendar.year_int IS 'Year as an integer.'; COMMENT ON COLUMN public.media_calendar.quarter_int IS 'Quarter of year by calendar months.'; COMMENT ON COLUMN public.media_calendar.dow_int IS 'Day of week as an integer from 0-6 Sunday-Saturday.'; COMMENT ON COLUMN public.media_calendar.day_local IS 'Day of week writeen out by locale.'; COMMENT ON COLUMN public.media_calendar.doy_int IS 'Day of year counting from Jan 1.'; COMMENT ON COLUMN public.media_calendar.boq_int IS 'Date of beginning of calendar quarter.'; COMMENT ON COLUMN public.media_calendar.doq_int IS 'Count of days since beginning of calendar quarter.'; COMMENT ON COLUMN public.media_calendar.day_english IS 'Day of the week written out in English. Does not use to_char().'; COMMENT ON COLUMN public.media_calendar.month_local IS 'Month of year written out using the locale settings.'; COMMENT ON COLUMN public.media_calendar.month_english IS 'Month of year written out explicitly in the English language.'; COMMENT ON COLUMN public.media_calendar.week_of_month IS 'Week of the month starting with the first in the first calendar week.'; COMMENT ON COLUMN public.media_calendar.week_of_year IS 'Week of year starting with the first in the first calendar week.'; COMMENT ON COLUMN public.media_calendar.julian_day IS 'Ordinal day on the Julian calendar since November 24, 4714 BC at midnight UTC'; COMMENT ON COLUMN public.media_calendar.is_weekday IS 'Is Monday-Friday.'; COMMENT ON COLUMN public.media_calendar.nth_day IS 'Day of month written using rank abbreviations. 1st, 2nd, 3rd...31st.'; COMMENT ON COLUMN public.media_calendar.full_english IS 'The formally written datum of the given date in English.'; COMMENT ON COLUMN public.media_calendar.first_day_of_week IS 'Date of the preceeding Sunday.'; COMMENT ON COLUMN public.media_calendar.first_day_of_month IS 'Date of the first of the month.'; COMMENT ON COLUMN public.media_calendar.first_day_of_year IS 'Date of the new year.'; COMMENT ON COLUMN public.media_calendar.previous_business_day IS 'Date of the previous weekday (Mon-Fri).'; COMMENT ON COLUMN public.media_calendar.next_business_day IS 'Date of the next weekday (Mon-Fri).'; COMMENT ON COLUMN public.media_calendar.is_leap_year IS 'Delegates the existence of leap year to the PostgreSQL engine.'; COMMENT ON COLUMN public.media_calendar.end_of_month IS 'Date of the last day of the month.'; COMMENT ON COLUMN public.media_calendar.end_of_week IS 'Date of the next Saturday.'; COMMENT ON COLUMN public.media_calendar.end_of_quarter IS 'Date of the last day in the quarter'; COMMENT ON COLUMN public.media_calendar.dow_in_month IS 'The ranked occurrence of this day in the month (2nd Saturday).'; COMMENT ON COLUMN public.media_calendar.dow_in_year IS 'How many times this day (Sun-Sat) has occurred so far this calendar year.'; COMMENT ON COLUMN public.media_calendar.dow_in_quarter IS 'How many times this day (Sun-Sat) has occurred in this quarter.'; COMMENT ON TABLE public.media_calendar IS 'Gregorian calendar with date parts and intervals broken out for convenience.'; -- This table fits completely in the 8k page. No need for TOAST. ALTER TABLE public.media_calendar ALTER COLUMN day_local SET STORAGE PLAIN; ALTER TABLE public.media_calendar ALTER COLUMN day_english SET STORAGE PLAIN; ALTER TABLE public.media_calendar ALTER COLUMN month_local SET STORAGE PLAIN; ALTER TABLE public.media_calendar ALTER COLUMN month_english SET STORAGE PLAIN; ALTER TABLE public.media_calendar ALTER COLUMN nth_day SET STORAGE PLAIN; ALTER TABLE public.media_calendar ALTER COLUMN full_english SET STORAGE PLAIN; -- -- Data for Name: media_calendar; Type: TABLE DATA; Schema: public -- INSERT INTO public.media_calendar (gregorian) SELECT '1900-01-01'::date + x -- Starting with 1900-01-01, fill the table with 200 years of data. FROM generate_series(0,365*200) x; COMMENT ON COLUMN public.media_calendar.end_of_year IS $$Date of new year's eve of this year. $$;
What is the advantage of generated columns vs populating columns by triggers? The only advantage I see is that you can’t populate them with something different. Is this correct?
I think one of the differences is that generated fields could be potentially used by the planner, even though you do not use them explicitely in your query. But I am not sure that PostgreSQL planner actually does consider them. I habe seen this in DB2.
Hey, I like that! Used to be a nice FileMaker feature.
You need 50 more days for the leap years, I guess.
Thanks for sharing.