As part of AXLE project work I recently found myself re-examining issues around optimisation fencing and non-semantic query execution changes in PostgreSQL. One key issue was the inability to use a CTE where optimisation (inlining, push-up/pull-down, flattening, etc) was desired. I’d like to explain that in more detail, as it’s something that can surprise new and experienced PostgreSQL users alike.
When optimising queries in PostgreSQL (true at least in 9.4 and older), it’s worth keeping in mind that – unlike newer versions of various other databases – PostgreSQL will always materialise a CTE term in a query.
- A query that should touch a small amount of data instead reads a whole table and possibly spills it to a tempfile; and
- You cannot UPDATE or DELETE FROM a CTE term, because it’s more like a read-only temp table rather than a dynamic view.
It means that these two queries are surprisingly different:
WITH aggs(k,n) AS (SELECT x, count(x) FROM big_table GROUP BY x) SELECT k, n FROM aggs WHERE k = 42;
SELECT k, n FROM (SELECT x, count(x) FROM big_table GROUP BY x) aggs(k,n) WHERE k = 42;
In other DBs they’d generally be different ways of writing the same thing, but in PostgreSQL the former will read and aggregate the whole table, then throw most of the work away, wheras the latter will generally get turned into:
SELECT k, n FROM (SELECT x, count(x) FROM big_table GROUP BY x WHERE x = 42) aggs(k,n)
by the query optimiser and only read a small subset of the underlying table.
As there is significant opposition to changing this behaviour, you may find that you want to stick to using subqueries in FROM even when a CTE would be a lot clearer to read.
Essentially, PostgreSQL currently uses CTEs as query hints in order to avoid having actual query hints like Oracle’s /*+ materialize */. So the semantics of the query are conflated with execution policy about the query.
Because people have been advised to use CTEs as optimisation fences to work around planner issues in the past:
Another possible application is to prevent unwanted multiple evaluations of functions with side-effects. However, the other side of this coin is that the optimizer is less able to push restrictions from the parent query down into a WITH query than an ordinary sub-query. The WITH query will generally be evaluated as written, without suppression of rows that the parent query might discard afterwards.
… it’s going to be more difficult to get this changed in future, though it doesn’t look like the SQL spec requires CTEs to behave this way.
Note that PostgreSQL also uses another
hint technique for blocking qualifier pull-up or push-down, the “offset 0” hack, e.g.:
SELECT k, n FROM (SELECT x, count(x) FROM big_table GROUP BY x OFFSET 0) aggs(k,n) WHERE k = 42;
… which also acts as an optimisation fence, but is an obviously quirky anomaly that doesn’t serve any other purpose, unlike WITH, so it stands out as a
hint way to modify execution without changing query semantics.
There are a few more like it, too.
The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 318633