WITH a little help from my friends
* and if you don’t like the weather,
go out and make some of your own!
(apologies to Scoop Nisker)
Get a little help for your queries WITH data sources that can be predefined and precalculated as well as clarify where your data is coming from and how.
PostgreSQL 12 has added a feature to the WITH statement functionality. It will inline WITH definitions
into the main query to speed up the query under certain situations. But first let us take a brief look at the WITH statement as it is.
The WITH enables you to create one or more subselects, or auxiliary statements, as Common Table Expressions (CTEs). You can view these as additional data sources (actually temporary tables) for use in your query or each other CTEs.
WITH [ RECURSIVE | MATERIALIZED ] AS ( SQL statement ) [, AS ( SQL statement) ] SQL Statement;
The CTEs may be SELECT, INSERT, UPDATE OR DELETE statements, but be careful. Try not to insert, update or delete the same row(s) in multiple CTEs. Statements are executed in the same snapshot therefore the order of execution of WITH clauses might vary.
The following sample would act as expected with the one UPDATE precalculated before the SELECT query.
with x as ( update foo set color='green' ) select * from foo ; -- the foo returned will be updated to be green
If there were two or more WITH statements coloring foo, it would not be deterministic to know which color got the last touch. All the rows would be the color of the last color seen.
In real tests, it seems that the WITH clauses actually are executed in the order presented. This is contradictory to warnings in The Fine Documentation 7.8.2. Data-Modifying Statements in WITH. Just be sensible and you will be OK.
with x as ( update foo set color='green' ), y as ( update foo set color='yellow' ), z as ( update foo set color='purple' ) select * from foo ; -- foo could be green, yellow or purple but happens to be purple
In this example we will populate a weather table randomly.* It is clearer to set up a few disparate data sources in advance and let them cross join for the small number of rows. Lateral might also work there.
We are hiding the randomization of the temperatures into a trigger function. We want to ensure that the random() function was run once per row.*
Together we are using these CTEs so we can create rows to INSERT into the weather table to use in further examples later.
- a regular WITH clause to generate the
crosstab of city, region, area triplet. - a recursive WITH to generate 8 years of dates* for weather.
Notice the end case on this recursive WITH. Use LIMIT, if you must, to ensure you have an end case for the recursion.
Given areas[area, aname,…]
Given regions[region, rname,…]
Given cities[city, region, area,…]
Given weather[…, city, region, area, …, high, low, wdt,…]
with recursive drange(d) as ( select current_date - '4 years'::interval union all select d + '1 day'::interval from drange where d + '1 day'::interval <= current_date + '4 years'::interval ), cra as ( select city, region, area from cities join regions using (region) join areas using (area) ) insert into weather (city, region, area, wdt) select city, region, area, d from cra, drange; -- crosstabs intentional
In PostgreSQL 12 by default, if a CTE is
- non-recursive and
- referenced only once and only once and
- has no side effects and
- is a SELECT query
then it is included inline into the main query. Inlining essentially undoes at runtime what you have done using a WITH to clarify a simple subquery. However, sometimes you do want to force a precalculation on a CTE, ahead of the main query.
Side-effects do matter. A side effect is a change of data outside of the SQL statement in question. For example it is a side effect if the statement causes a write to a log line to a table or file.
Also consider whether a complex and slow calculation is involved or the result of the CTE reduces the number of rows returned by qualifications or aggregation. Or the case when the statement contains side-effects of volatile functions. These CTEs should be materialized so that you don’t double the work.
It is important to note that a materialized CTE has no indexes. If you rely on indexes, and you should rely on them for large tables, you can use “WITH cte AS MATERIALIZED (…” The MATERIALIZED label prevents the inlining and preserves the indexes of the original table. But remember, no side effects should happen.
Another reason to use the materialized CTEs is if the query for the CTE has one or more expensive functions. This case is where a “run once and use everywhere” action would be helpful. You don’t want to calculate something expensive more than once if you can help it.
In this example, we use a CTE to generate the primary data and then use twice–once to insert it, via another
CTE to an aggregate table, and then we use the same data set to produce a report.
We generate an accurate human-readable report that shows the data inserted into the aggregator table. Therefore we use the same query data to produce the report that we wrote.
create table if not exists avg_2w_weather( sdate date, edate date, region integer, area integer, city text, avg_high integer, avg_low integer); with cra as ( select city, region, rname, area, aname from cities join regions using (region) join areas using (area) ), w AS MATERIALIZED ( select (current_date - interval '7 days')::date as sdate, (current_date + interval '7 days')::date as edate, city, rname, region, aname, area, avg(high)::integer as avg_high, avg(low)::integer as avg_low from weather join cra using(city, region, area) where wdt between current_date - interval '7 days' and current_date + interval '7 days' group by rname, region, aname, area, city, sdate, edate ), store_avgs as ( INSERT into avg_2w_weather (sdate, edate, city, region, area, avg_high, avg_low) select sdate, edate, city, region, area, avg_high, avg_low from w ) select sdate, edate, rname||aname as raname, city, avg_high, avg_low from w order by raname, city ;
Using the WITH statement lets you separate out your data sources for clarity and readability while not losing the optimization of inlining subqueries when appropriate.
Get by with a little help from WITH.
Leave a Reply
Want to join the discussion?Feel free to contribute!