One of the coolest things about Postgres functions is that they can return rows as if they were a table. Not only is it possible, but creating a set-returning function in Postgres is almost frighteningly trivial when compared to other database engines. In fact, some Oracle users are probably rolling their eyes and muttering “Just use a cursor!” already. Just hear us out!
Postgres functions can return cursors too, but using them afterwards isn’t exactly a friendly experience. Cursors are passed by reference, so the function must either accept a parameter to name the cursor, or it generates something wacky like “<unnamed portal 1>”. After that, cursors can only be used with
FETCH instead of
SELECT, greatly limiting their utility.
What about views, then? Obviously they can return rows like a table, so why not just write everything as a query and turn it into a view? It’s true that practically any function can be emulated with a sufficiently advanced query. This is especially true since Postgres 9 added CTEs, allowing it to generate step-by-step transformations. Some use cases aren’t well covered by that approach, however.
Stop, Wait a Minute
Consider the Fibonacci Sequence. Here’s a function that iteratively generates the whole sequence up to the Nth value:
CREATE OR REPLACE FUNCTION fibonacci(nLoops INT) RETURNS SETOF NUMERIC AS $$ DECLARE i INT; x NUMERIC := 0; y NUMERIC := 1; z NUMERIC := 1; BEGIN FOR i IN 1..nLoops LOOP RETURN NEXT x; x = y; y = z; z = x + y; END LOOP; END; $$ LANGUAGE plpgsql; SELECT max(fibonacci) FROM fibonacci(100); max ----------------------- 218922995834555169026
Amusingly enough, this particular function works until the
NUMERIC datatype itself is exhausted, provided sufficient hardware. Our test VM was able to survive until the 400,000th result before subsequent attempts exhausted our meager 8GB of temporary disk space. We’d print it here, but sadly a number with 83,595 digits won’t fit in the article.
Take a Sip, Sign a Check
Now consider if we had implemented this as a CTE instead. That by itself isn’t difficult, we just need to rewrite things a bit:
WITH RECURSIVE fib(i, a, b) as ( SELECT 1, 0::NUMERIC, 1::NUMERIC UNION ALL SELECT i + 1, b, a + b FROM fib WHERE i <= 100 ) SELECT max(a) FROM fib;
But how can this be turned into a general-purpose view? We would have to raise the limit of the recursive portion of the CTE to some comfortable (or uncomfortable) upper bound so the caller could limit the results with either a
This has two problems. The first is that the maximum is arbitrary and may not correspond with desired use patterns. The second is that CTEs produce all results before filters are applied, meaning a lot of wasted effort. CTEs don't really work well in views since the inner components can't be tweaked for specific conditions, yet without them, SQL is stateless. There's also the matter of exception handling. So any time we need a state machine that returns rows, functions are really our only answer.
Don't Brag About it, Come Show Me
The primary limitation to Postgres functions is that the return type must exist before the function is declared. There are currently three ways to do this. Let's go back to our trusty
CREATE TABLE sensor_log ( sensor_log_id SERIAL PRIMARY KEY, location VARCHAR NOT NULL, reading BIGINT NOT NULL, reading_date TIMESTAMP NOT NULL ); INSERT INTO sensor_log (location, reading, reading_date) SELECT s.id % 1000, round(random() * 100), CURRENT_DATE + INTERVAL '1d' - ((s.id * 10)::TEXT || 's')::INTERVAL FROM generate_series(1, 10000000) s(id); CREATE INDEX idx_sensor_log_location ON sensor_log (location); CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
As usual, the table consists of 10-million rows, spread across 1000 locations with 100 possible reading values. In Postgres, every table also exists as a type which represents the table structure. We can see this by executing a completely non-recommended bit of SQL:
DROP TYPE sensor_log; ERROR: cannot drop type sensor_log because table sensor_log requires it HINT: You can drop table sensor_log instead.
Given the type already exists, we can write a function which returns rows that masquerade as the
sensor_log table. Here's a very basic function that merely grabs the top ten readings for a particular location:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR) RETURNS SETOF sensor_log AS $$ DECLARE rReturn sensor_log; BEGIN FOR rReturn IN SELECT * FROM sensor_log WHERE location = nLocation ORDER BY reading DESC LIMIT 10 LOOP RETURN NEXT rReturn; END LOOP; END; $$ LANGUAGE plpgsql; SELECT * FROM sensor_location_top_ten('15'); sensor_log_id | location | reading | reading_date ---------------+----------+---------+--------------------- 1068015 | 15 | 100 | 2017-02-22 09:17:30 1255015 | 15 | 100 | 2017-01-31 17:50:50 497015 | 15 | 100 | 2017-04-29 11:24:10 894015 | 15 | 100 | 2017-03-14 12:37:30 248015 | 15 | 100 | 2017-05-28 07:04:10 1092015 | 15 | 100 | 2017-02-19 14:37:30 183015 | 15 | 100 | 2017-06-04 19:37:30 35015 | 15 | 100 | 2017-06-21 22:44:10 2245015 | 15 | 100 | 2016-10-09 03:50:50 1767015 | 15 | 100 | 2016-12-03 11:37:30 (10 rows)
Break it Down
But what if we want to permanently remove the location column from the results? We can't do that because there's no type with that definition. Of course, we can make one and then use it instead:
CREATE TYPE sensor_log_noloc AS ( sensor_log_id INT, reading BIGINT, reading_date TIMESTAMP ); CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR) RETURNS SETOF sensor_log_noloc AS $$ DECLARE rReturn sensor_log_noloc; BEGIN FOR rReturn IN SELECT sensor_log_id, reading, reading_date FROM sensor_log WHERE location = nLocation ORDER BY reading DESC LIMIT 10 LOOP RETURN NEXT rReturn; END LOOP; END; $$ LANGUAGE plpgsql;
Julio, Get the Stretch
Alternatively, Postgres functions can define specific
OUT variables. Declared in order, we can use them as a virtual row-set. The major caveat here is that parameter names must not conflict with table column names due to ambiguity concerns. This means we should either rename our parameters, or be particularly meticulous when using table column aliases.
Here's how the function would look if we took the latter approach:
CREATE OR REPLACE FUNCTION sensor_location_top_ten( nLocation VARCHAR, sensor_log_id OUT INT, reading OUT BIGINT, reading_date OUT TIMESTAMP ) RETURNS SETOF RECORD AS $$ BEGIN FOR sensor_log_id, reading, reading_date IN SELECT s.sensor_log_id, s.reading, s.reading_date FROM sensor_log s WHERE s.location = nLocation ORDER BY s.reading DESC LIMIT 10 LOOP RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql;
It's a bit more work up front, but it removes the necessity of creating a corresponding type for every function that returns row sets.
Saturday Night, and We in the Spot
Of course, saving the best for last is a pretty classic move, and we're suckers for tradition. Beginning in Postgres 8.4, functions gained one final syntax that did two things that effectively deprecated the previous two approaches. First, we no longer have to declare the type before the function. Second, we can avoid polluting our argument list with return values.
Here's how it looks:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR) RETURNS TABLE( sensor_log_id INT, reading BIGINT, reading_date TIMESTAMP ) AS $$ BEGIN FOR sensor_log_id, reading, reading_date IN SELECT s.sensor_log_id, s.reading, s.reading_date FROM sensor_log s WHERE s.location = nLocation ORDER BY s.reading DESC LIMIT 10 LOOP RETURN NEXT; END LOOP; END; $$ LANGUAGE plpgsql;
What's interesting about this approach is that it's otherwise identical to using
OUT parameters. The principal differences are the previously mentioned reduction in parameter clutter, and that a table-returning function implicitly returns a set of records. In essence, it's basically just
OUT-parameter shorthand someone hacked into the Postgres engine.
Don't Believe Me, Just Watch
Postgres functions have a plethora of such conveniences, and like most maturing platforms, it accumulates more with every passing generation. Since Postgres 10 is shaping up to be a rather comprehensive amalgam of varied enhancements, it's exciting to see how functions might evolve. Even if they remain static until some mysterious future version, there's still a lot more for us to explore.
And we most definitely will.