Earlier I had written a blog about partition-wise join in PostgreSQL. In that blog, I had talked about an advanced partition matching technique which will allow partition-wise join to be used in more cases. In this blog we will discuss this technique in detail.
To recap, the basic partition matching technique allows a join between two partitioned tables to be performed using partition-wise join technique if the two partitioned tables have exactly matching partition bounds e.g. partitioned tables prt1 and prt2 described below
psql> \d+ prt1 ... [output clipped] Partition key: RANGE (a) Partitions: prt1_p1 FOR VALUES FROM (0) TO (5000), prt1_p2 FOR VALUES FROM (5000) TO (15000), prt1_p3 FOR VALUES FROM (15000) TO (30000)
psql>\d+ prt2 ... [ output clipped ] Partition key: RANGE (b) Partitions: prt2_p1 FOR VALUES FROM (0) TO (5000), prt2_p2 FOR VALUES FROM (5000) TO (15000), prt2_p3 FOR VALUES FROM (15000) TO (30000)
A join between prt1 and prt2 on their partition key (a) is broken down into joins between their matching partitions i.e. prt1_p1 joins prt2_p1, prt1_p2 joins prt2_p2 and prt1_p3 joins prt2_p3. The results of these three joins together form the result of the join between prt1 and prt2. This has many advantages as discussed in my previous blog. However, basic partition matching can not join two partitioned tables with different partition bounds. In above example, if prt1 has an extra partition prt1_p4 FOR VALUES FROM (30000) TO (50000), basic partition matching would not help to convert a join between prt1 and prt2 into a partition-wise join since they do not have exactly matching partition bounds.
Many applications use partitions to segregate actively used data and stale data, a technique I discussed in my another blog. The stale data is eventually removed by dropping partitions. New partitions are created to accommodate fresh data. A join between two such partitioned tables will mostly use partition-wise join since most of the time they will have matching partitions. But when an active partition gets added to one of these tables or a stale one gets deleted, their partition bounds will not match till the other table also undergoes a similar operation. During that interval a join between these two tables won’t use partition-wise join and may take unusually longer time to execute. We don’t want a join hitting the database during this small duration to perform bad since it can not use partition-wise join. Advanced partition matching algorithm helps in this and more complicated cases where partition bounds do not match exactly.
Advanced partition matching algorithm
Advanced partition matching technique finds matching partitions from two partitioned tables even when their partition bounds do not match exactly. It finds matching partitions by comparing the bounds from both the tables in their sorted order similar to the merge join algorithm. Any two partitions, one from each of the partitioned table, whose bounds match exactly or overlap are considered to be joining partners since they may contain joining rows. Continuing with the above example, let’s say an active new partition prt2_p4 gets added to prt4. The partitioned tables now look like:
psql>\d+ prt1 ... [output clipped] Partition key: RANGE (a) Partitions: prt1_p1 FOR VALUES FROM (0) TO (5000), prt1_p2 FOR VALUES FROM (5000) TO (15000), prt1_p3 FOR VALUES FROM (15000) TO (30000)
psql>\d+ prt2 ... [ output clipped ] Partition key: RANGE (b) Partitions: prt2_p1 FOR VALUES FROM (0) TO (5000), prt2_p2 FOR VALUES FROM (5000) TO (15000), prt2_p3 FOR VALUES FROM (15000) TO (30000), prt2_p4 FOR VALUES FROM (30000) TO (50000)
It’s easy to see that partition bounds of prt1_p1 and prt2_p1, prt1_p2 and prt2_p2, and prt1_p3 and prt2_p3 respectively match. But unlike basic partition matching, advanced partition matching will know that prt2_p4 does not have any matching partition in prt1. If the join between prt1 and prt2 is an INNER join or when prt2 is INNER relation in the join, the join result will not have any row from prt2_p4. Enabled with detailed information about the matching partitions and partitions which do not match, as against merely whether partition bounds match or not, query optimizer can decide whether to use partition-wise join or not. In this case, it will choose to execute the join as join between the matching partitions leaving prt2_p4 aside. But that’s not much like an "advanced" partition matching. Let’s see a bit more complicated case using list partitioned tables this time:
psql>\d+ plt1 Partition key: LIST (c) Partitions: plt1_p1 FOR VALUES IN ('0001', '0003'), plt1_p2 FOR VALUES IN ('0004', '0006'), plt1_p3 FOR VALUES IN ('0008', '0009')
psql>\d+ plt2 Partition key: LIST (c) Partitions: plt2_p1 FOR VALUES IN ('0002', '0003'), plt2_p2 FOR VALUES IN ('0004', '0006'), plt2_p3 FOR VALUES IN ('0007', '0009')
Observe that there are exactly three partitions in both the relations but partition value lists differ. The list corresponding to partition plt1_p2 match exactly that of plt2_p2. Other than that no two partitions, one from either side, have exactly matching lists. Advanced partition matching algorithm deduces that plt1_p1 and plt2_p1 have overlapping lists and their lists do not overlap with any other partition from the other relation. Similarly for plt1_p3 and plt2_p3. Query optimizer then sees that the join between plt1 and plt2 can be executed as partition-wise join by joining the matching partitions i.e. plt1_p1 and plt2_p1, plt1_p2, and plt2_p2, and plt1_p3 and plt2_p3 respectively. The algorithm can find matching partitions in even more complex partition bound sets of list as well as range partitioned tables. But we will not cover them for the sake of brevity. Interested and more daring readers may take a look at the commit. It also has many testcases, which show various scenarios where advanced partition matching algorithm is used.
Outer joins with matching partitions missing on the inner side
Outer joins pose a particular problem in PostgreSQL world. Consider prt2 LEFT JOIN prt1, in the above example, where prt2 is an OUTER relation. prt2_p4 does not have a joining partner in prt1 and yet the rows in that partition should be part of the join result since they belong to the outer relation. In PostgreSQL when the INNER side of a join is empty, it’s represented by a "dummy" relation which emits no rows but still knows the schema of that relation. Usually a "dummy" relation emerges from a non-dummy relation which is not going to emit any rows because of some query optimization like constraint exclusion. PostgreSQL’s query optimizer marks such a non-dummy relation as dummy and the executor proceeds normally when executing such a join. But when there is no matching inner partition for an outer partition, there is no "existing entity" which can be marked as "dummy". For example, in this case there is no prt1_p4 which can represent dummy inner partition joining outer prt2_p4. Right now, PostgreSQL does not have a way to "create" such "dummy" relations during planning. Hence the query optimizer does not use partition-wise join in this case.
Ideally such a join with empty inner only requires schema of the inner relation and not an entire relation. This schema can be derived from the partitioned table itself. All it needs is an ability to produce the join row by using the columns from a row in the outer side joined by NULL values for the columns from inner side. Once we have that ability in PostgreSQL, query optimizer will be able to use partition-wise join even in these cases.
Let me emphasize that the outer joins where there are no missing partitions on the inner join do use partition-wise join.
Multiple matching partitions
When the tables are partitioned such that multiple partitions from one side match one or more partitions on the other side, partition-wise join can not be used since there is no way to induce an "Append" relation during planning time which represents two or more partitions together. Hopefully we will remove that limitation as well sometime and allow partition-wise join to be used in those cases as well.
Hash partitioned tables
Partition bounds of two hash partitioned tables using same modulo always match. When the modulo is different, a row from a given partition of one table may have its joining partners in many of partitions of the other, thus a given partition from one side matches multiple partitions of the other table thus rendering partition-wise join ineffective.
When the advanced partition matching algorithm fails to find matching partitions or partition-wise join can not be used because of the above limitations, PostgreSQL falls back to join the partitioned tables as regular tables.
Advanced partition matching time
Simon brought up an interesting point when commenting about the feature. Partitions of a partitioned table do not change often so the result of advanced partition matching should remain same for longer duration. Computing it every time a query involving these tables is executed is unnecessary. Instead we could save the set of matching partitions in some catalog and refresh it every time the partitions change. That is some work but it’s worth the time spent in matching the partition for every query.
Even with all these limitations, what we have today is a very useful solution which serves most of the practical cases. Needless to say that this feature works seamlessly with FDW join push down improving the sharding capabilities that PostgreSQL already has!