PostgreSQL 12: A Few Special-Case Performance Enhancements
With every new release of PostgreSQL, there are a range of performance enhancements. Some are system-wide and affect every user, but most are highly specific to a certain use case. In this post, I am going to briefly highlight three improvements in PG12 that speed up certain operations.
1. Minimal decompression of TOAST values
TOAST values are values that take up too much space to be stored conventionally within a record, so are stored out-of-line in a separate table-like structure on disk. By default, these large values are also compressed to save space.
There are cases where you don’t need the whole value, but only the bytes at the very front, such as:
- The text functions
substr()
andstarts_with()
- The bounding box of a geometry type, which is used as an optimization for PostGIS point-in-polygon spatial joins
To use any portion of these values within a query in PG11 and earlier, the entire value is always decompressed first. This is a waste of CPU cycles. In PG12, we only decompress what we need, greatly speeding up accessing the beginning of a TOAST-ed value.
2. Faster float conversion to text
Previously, floating-point values were output by rounding to a specific number of digits, controlled by the configuration parameter extra_float_digits
. This method is slow for output of large tables with many columns of real
or double precision
type.
In PG12, the decimal representation output is the shortest one that would be re-input as the exact original binary value. For example:
SET extra_float_digits = 3;
SELECT 0.3::double precision AS result;
In binary, 0.3 cannot be represented precisely. Previously, this would output as close to this imprecise binary value as possible:
result
----------------------
0.299999999999999989
(1 row)
In PG12, this outputs
result
--------
0.3
(1 row)
This representation is several times faster to generate. In this case, it also happens to match the original input, which is convenient.
3. Parallel query with SERIALIZABLE
The SERIALIZABLE
isolation level is the strictest isolation level possible in PostgreSQL. We’ve had parallel query since PG9.6, but it was disallowed when SERIALIZABLE
was specified. This is no longer the case, and now all isolation levels can take advantage of parallelism during query execution.
For more information on these, and many other improvements, see the PG12 release notes.
Hi John, thanks to highlight these new features.
Comparison of TOAST and LO:
1. TOAST: The entire TOAST value is compressed and cut in pieces of 2040 bytes, which does not make possible to retrieve a portion of the toast; the entire TOAST should be read and decompressed.
2. Large object (LO): LO are cut in pieces of 2048, which are compressed individually. Doing so, allow to get which pieces contain a portion the LO (not only at the beginning but, at any position).
First, is that right?
Second, what are the changes (high level algo) that allow retrieving a portion of the TOAST?
Does changes allow to retrieve only the beginning of the TOAST? Is it possible to retrieve a random portion of the TOAST?
For a specific use case where JPG2000 images are stored in TOAST.
– Images are sometime displayed as thumbnail in a N x M mosaic and less than 20% needs to be retrieve for the thumbnail resolution.
– Other time, two images are displayed side by side for comparison and about 50% is needs.
Would TOAST be suitable in such a use case?
Currently our images are stores as LO, but LO has several design issues that we would be happy to get ride of.
The optimisation with TOAST is to allow decompression to stop at a certain point. It’s a prefix-only optimisation, it cannot avoid reading and decompressing all data before the wanted data in a given TOASTed value, just skip decompressing the data after it. I think that’s only the case in some places where the caller can determine in advance that it won’t need the full value or can progressive fetch more. I wasn’t closely involved in that work and suggest you review the pgsql-hackers thread and the commit(s) if you want the details of how it works.
Large objects by contrast are individually compressed in chunks that means that (mostly-)random access is possible.
From your description it sounds like you might well be able to benefit from the TOAST prefix fetching optimisation in your workload. I doubt it’ll make much practical difference for the 50% fetch, but for 20% it could be well worth it. Try it out and see in a test environment.