Partition Elimination in PostgreSQL 11
The feature freeze for the PostgreSQL 11 release is now upon us. During the last few days my colleague Álvaro Herrera pushed two changes into the development branch of PostgreSQL:
1. Faster Partition Pruning
2. Partition Pruning at Execution Time
These patches aim to improve the performance and usability of the declarative table partitioning feature (added in PostgreSQL 10). Amit Langote wrote the first of these two patches, with some assistance from me. I’m the author of the second patch. This one is based on an original patch by Beena Emerson.
Background
Internally in PostgreSQL, a partitioned table is made up from a series of individual tables. These tables are all grouped under one common parent partitioned table. Queries being run against the partitioned table need the results of each individual table to be “concatenated” before the final result is produced. Many queries, especially OLTP type queries, will only require data from a small number of partitions (perhaps just 1!). In order to save PostgreSQL from having to needlessly trawl through all partitions for data that might not even be there, PostgreSQL tries to eliminate partitions that won’t contain any needed records.
In PostgreSQL 10 this elimination took place via the “constraint_exclusion” mechanism, which was a linear algorithm that required looking at each partition’s metadata one-by-one to check if the partition matched the queries WHERE clause.
Faster Partition Pruning
In PostgreSQL 11 this elimination of unneeded partitions (aka partition pruning) is no longer an exhaustive linear search. A binary search quickly identifies matching LIST and RANGE partitions. A hashing function finds the matching partitions for HASH partitioned tables, which are new in PG11.
This also makes improvements so that it’s able to prune partitions in a few more cases than was previously possible.
Partition Pruning at Execution Time
Required partitions are normally identified only by the query planner, however, it’s only able to perform this identification process using values known at plan time. This is obviously no good for queries with parameters such as PREPAREd queries.
Also, in cases such as:
SELECT ... FROM parttab
WHERE partkey = (SELECT ... FROM othertable WHERE ...);
The query planner can’t prune any partitions since the value of the subquery is only known at execution. The situation is improved with this patch, as this allows the executor to perform the partition pruning itself.
The patch performs partition pruning during execution in 2 phases. Phase-one pruning is performed during executor initialization. Here we prune partitions based on query parameters, such as the ones found in PREPAREd statements.
This phase-one pruning gives a very nice boost to OLTP type queries with many partitions. Here we only initialize partitions which actually match the queries parameters. Partitions pruned during this phase won’t be shown in the EXPLAIN output, so you might mistakenly think they’ve been removed during query planning, but you can find out if any are being removed here by looking for the number of “Subnodes Removed” under the “Append” node in EXPLAIN.
During the 2nd phase of pruning, we remove partitions using parameters that are only known when the executor is actually running. These are parameters such as ones from subqueries and parameters in parametrized nested loops.
Benchmark
A picture speaks a thousand words, so here’s a chart showing the performance comparison I did with a non-partitioned table in PG11 vs a partitioned table in PG10 vs a partitioned table in PG11.
I performed the benchmark using the standard pgbench tool that comes with PostgreSQL. I took the standard pgbench_accounts table and partitioned it into 10 equally sized RANGE partitions and at scale 100 (only 10 million records in total) and performed a benchmark using both PREPAREd and non-PREPAREd transactions. The benchmark used a single thread, so the TPS values shown are for comparison only.
Comparing red and green in the first set of columns, we see a nice increase in the query planner’s performance. There’s still quite a bit of other overhead for partitioned tables.
With the “PREPAREd” results, we don’t have any planning time overhead, so this is purely executor time. We can see that we’re coming fairly close to the performance of a non-partitioned table in this case.
Of course, this benchmark was done with just 10 million rows. Scaling such a test into billions of rows is likely to see partitioning shine much more, especially so when the indexes of the non-partitioned tables no longer fit inside RAM.
What’s next? PG12…
There’s still more to do here in the future. At the moment execution-time pruning only performs pruning of Append nodes. We were just a bit late to get the code in for MergeAppend or for ModifyTable nodes (UPDATE/DELETE)
Also the faster partition pruning patch currently only works for SELECT queries. Unfortunately, the code still uses the old linear algorithm for UPDATE/DELETE queries.
Never-the-less what we have for PG11 is a significant improvement over PG10!
Of course, there’s still not a 100% guarantee that these features will be in PG11, so I urge all of you to get testing them as soon as possible so we can resolve any bugs quickly.
In the meanwhile, if your organization needs these features today, you can get them in 2ndQPostgres.
Thanks
I want to thank Amit Langote for working on the faster partition pruning patch and for putting up with my pedantic reviews for about 5 months! The execution-time pruning patch depends on the infrastructure added in this patch, so without that execution-time pruning would not exist. I also want to thank everyone who reviewed the execution-time pruning patch and Álvaro Herrera for his final review and commit.
These two new features are just one of many partitioning improvements coming in PostgreSQL 11. Also, have a look at Partitioned Indexes.
I’m just now reading about declarative partitioning in PG, and it seems exciting, but are there any benchmarks that demonstrate performance improvements? Your graph is the only one I’ve seen and it seems to demonstrate that non-partitioned tables are significantly faster than partitioned ones, even in HEAD. Maybe I’ve misunderstood your explanation.
It’s really impossible to provide a generic set of benchmarks to prove what’s better either way. Imagine you have a sales table and you partition it by month at after month end you do:
select sum(price) from sales where;
assuming your company is quite large the execution time of that query might be long. If PostgreSQL can execute it as a seq scan of the entire month’s table then that’ll likely be much faster than an index scan on some non-partitioned table which stores 5 years of data.
In this case, planning time is probably not of a concern.
My benchmarks above are of a single primary key lookup, which are very fast plans to execute. So overhead from the planner is of the biggest concern in these cases. The benchmark shows how things have improved from PG10 from that point of view. The righthand bars show the execution time has improved significantly for these types of queries.
Biggest beneficiaries of partition pruning would be OLAP workloads for which planning time is not important. So I think that TPS workload doesn’t show how great this new feature really is.
Will the pruning work when volatile functions are used. E.g. event_date > now() – interval ‘1 day’ ?
Your example will work with the new code if you do: event_date > (select now() – interval ‘1 day’);
This changes things so that event_date is being compared to an execution parameter rather than a stable function. The same will work with volatile functions too, although you need to decide yourself if changing the multiple evaluations into a single evaluation is what you want and need.
I’ll take that! However keep in mind that many times OLAP SQL is generated by some reporting tool, and it’s unlikely that the tool will rewrite SQL to optimize for Postgres partition pruning.
Good news: In the past few hours we’ve made a change to PG11 to allow run-time pruning to work with any stable expression, providing there are no columns from the partitioned table in the expression. As of PG11 beta2 there will be no need to wrap the stable function call in a subselect like I demonstrated above. Also, providing the expression does not reference any columns from upper-level queries, the pruning will take place during executor startup. So you shouldn’t see any unrelated partitions in EXPLAIN. Assuming a correctly partitioned table, this means where clauses like `WHERE date_column = CURRENT_DATE`, and `WHERE timestamp >= NOW() – INTERVAL ‘7 days’ AND timestamp <= NOW()` will be supported.
Good news.
Is there any news related with that post https://blog.2ndquadrant.com/enhancements-to-partitioning-and-indexes-in-postgresql-11/ ??
Yes. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3de241dba86f3dd000434f70aebba725fb928032
This is extremely good news. I do have one question though. Does this affect lock explosion with lots of partitions? I.e. if I partition two levels deep does it still have to do an access share lock on the entire tree?
Hi Christopher, unfortunately, nothing improves that in PG11. I’ll be working on that for PG12 though.
Great work! This is exactly what we will be needing. However, will you be publishing an updated version of the MergeAppend patch to work with the to be released PG11 (when compiling from source)? It is nice that pruning works with the Append node, but in many cases it is entirely up to the planner whether it prefers an Index Scan->Merge Append or an Index Scan->Append->Sort. If only one of them uses execution time pruning, it will lead to a large difference in performance between these two cases and it will not be trivial to force the planner to always choose the Append node.
What was the reason the MergeAppend didn’t get to the feature list? From what I read on the mailing list, everything seemed to be ready, except for the final review.
Hi Floris, Thanks for the feedback. It’s a shame the MergeAppend patch didn’t make PG11. I think this was mostly due to committer attention being elsewhere until the last few days of the cycle. Alvaro’s effort was quite heroic to get in what he did. I’ll be submitting the patch again for PG12 and it’ll be part of 2ndQPostgres for v11.
Cool, that’s good to hear! Yes, I really appreciate the effort that was put in the patch. 🙂 It is going to be very useful.
> In the past few hours we’ve made a change to PG11 to allow run-time pruning to work with any stable expression
this is super useful now. Thank you very much for hard work and persistence in getting this in PG11.
The improved partition pruning opens Postgres to new levels of scalability when working with large tables, such as events from Segment IO or large historical datasets using BI tools.
Thank you, David!
Update / Delete also includes INSERT?
INSERT with VALUES does not have any subnodes to eliminate with run-time pruning. For INSERT with a SELECT the run-time pruning would happen in the SELECT, so basically the answer is No.