PostgreSQL 9.4 extends the SQL standard by inserting two new clauses that facilitate many operations required during the development of applications: the WITHIN GROUP and FILTER clauses.
WITHIN GROUP clause is particularly useful when performing aggregations on ordered subsets of data.
PostgreSQL introduced window functions since version 9.0 in order to work on subsets of data that can be correlated to each current record of tables, defining a sort of “aggregates” centred on any specific record as the query is gradually executed via the SQL
OVER(PARTITION BY/ORDER BY) clause and by using the functions that can be performed on those aggregations.
With version 9.4 of PostgreSQL the SQL
WITHIN GROUP clause was introduced: this simplified many operations that had previously only been possible with the use of the window functions, defining aggregations of ordered subsets of data.
In addition, new functions were introduced that can be applied to these subsets and expand the collection of available window
percentile_disc()for the calculation of
mode()a statistical function that calculates the mode on ordered
cume_dist(): window functions already available in PostgreSQL to be executed on the subsets obtained using the
OVER (PARTITION BY/ORDER BY)clause and now able to take as a parameter ordered subsets produced by the
To better clarify the situation, let’s say, for example, that we want to calculate the 25th, the 50th, the 75th and the 100th percentile of the first 20 integers. Until now, this was only possible by dividing the numbers into 4 sets via the
OVER (PARTITION BY/ORDER BY) clause, then order them internally into 4 ordered subsets from which to then take the maximum value – for example, by using a CTE:
$ CREATE TABLE t AS SELECT generate_series(1,20) AS val; $ WITH subset AS ( SELECT val, ntile(4) OVER (ORDER BY val) AS tile FROM t ) SELECT max(val) FROM subset GROUP BY tile ORDER BY tile; max ------ 5 10 15 20 (4 rows)
With PostgreSQL 9.4 everything is reduced to a single SQL command, resulting in significant advantages in terms of readability of the scripts and execution of the commands:
$ CREATE TABLE t AS SELECT generate_series(1,20) AS val; $ SELECT unnest(percentile_disc(array[0.25,0.5,0.75,1]) WITHIN GROUP (ORDER BY val)) FROM t; max ------ 5 10 15 20 (4 rows)
This second clause of SQL commands is useful when you want to apply filters on subsets of data without necessarily
For example, it is now possible to perform a total
count of the records of a table and also a partial code>count of one of its subsets that satisfies a certain condition (expressed by the
WHERE clause) within a single query, without having to use further ones to be performed on aggregations:
$ SELECT count(*) count_all, count(*) FILTER(WHERE bid=1) count_1, count(*) FILTER(WHERE bid=2) count_2 FROM pgbench_history; count_all | count_1 | count_2 ----------+---------+--------- 7914 | 758 | 784 (1 row)
In this particular case, this also simplifies the readability of scripts and improves execution performances.
The extension of the SQL standard through the introduction of these new clauses further facilitates the work of developers, who are increasingly able to delegate manipulation and aggregation of subsets of data to the database.
By using the
WITHIN GROUP clause, the management of subsets of data that can be ordered becomes easier through the introduction of new window functions.
FILTER clause helps to manage subsets of data that meet certain conditions, thereby avoiding aggregations.
Last updated 2015-05-21 10:51:41 CEST