PG Phriday: Uptown Func
One of the coolest things about Postgres functions is that they can return rows as if they were a table. Not only is it possible, but creating a set-returning function in Postgres is almost frighteningly trivial when compared to other database engines. In fact, some Oracle users are probably rolling their eyes and muttering “Just use a cursor!” already. Just hear us out!
Postgres functions can return cursors too, but using them afterwards isn’t exactly a friendly experience. Cursors are passed by reference, so the function must either accept a parameter to name the cursor, or it generates something wacky like “<unnamed portal 1>”. After that, cursors can only be used with FETCH
instead of SELECT
, greatly limiting their utility.
What about views, then? Obviously they can return rows like a table, so why not just write everything as a query and turn it into a view? It’s true that practically any function can be emulated with a sufficiently advanced query. This is especially true since Postgres 9 added CTEs, allowing it to generate step-by-step transformations. Some use cases aren’t well covered by that approach, however.
Stop, Wait a Minute
Consider the Fibonacci Sequence. Here’s a function that iteratively generates the whole sequence up to the Nth value:
CREATE OR REPLACE FUNCTION fibonacci(nLoops INT)
RETURNS SETOF NUMERIC AS
$$
DECLARE
i INT;
x NUMERIC := 0;
y NUMERIC := 1;
z NUMERIC := 1;
BEGIN
FOR i IN 1..nLoops LOOP
RETURN NEXT x;
x = y;
y = z;
z = x + y;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT max(fibonacci) FROM fibonacci(100);
max
-----------------------
218922995834555169026
Amusingly enough, this particular function works until the NUMERIC
datatype itself is exhausted, provided sufficient hardware. Our test VM was able to survive until the 400,000th result before subsequent attempts exhausted our meager 8GB of temporary disk space. We’d print it here, but sadly a number with 83,595 digits won’t fit in the article.
Take a Sip, Sign a Check
Now consider if we had implemented this as a CTE instead. That by itself isn’t difficult, we just need to rewrite things a bit:
WITH RECURSIVE fib(i, a, b) as (
SELECT 1, 0::NUMERIC, 1::NUMERIC
UNION ALL
SELECT i + 1, b, a + b
FROM fib
WHERE i <= 100
)
SELECT max(a)
FROM fib;
But how can this be turned into a general-purpose view? We would have to raise the limit of the recursive portion of the CTE to some comfortable (or uncomfortable) upper bound so the caller could limit the results with either a LIMIT
or WHERE
clause.
This has two problems. The first is that the maximum is arbitrary and may not correspond with desired use patterns. The second is that CTEs produce all results before filters are applied, meaning a lot of wasted effort. CTEs don't really work well in views since the inner components can't be tweaked for specific conditions, yet without them, SQL is stateless. There's also the matter of exception handling. So any time we need a state machine that returns rows, functions are really our only answer.
Don't Brag About it, Come Show Me
The primary limitation to Postgres functions is that the return type must exist before the function is declared. There are currently three ways to do this. Let's go back to our trusty sensor_log
table:
CREATE TABLE sensor_log (
sensor_log_id SERIAL PRIMARY KEY,
location VARCHAR NOT NULL,
reading BIGINT NOT NULL,
reading_date TIMESTAMP NOT NULL
);
INSERT INTO sensor_log (location, reading, reading_date)
SELECT s.id % 1000, round(random() * 100),
CURRENT_DATE + INTERVAL '1d' - ((s.id * 10)::TEXT || 's')::INTERVAL
FROM generate_series(1, 10000000) s(id);
CREATE INDEX idx_sensor_log_location ON sensor_log (location);
CREATE INDEX idx_sensor_log_date ON sensor_log (reading_date);
As usual, the table consists of 10-million rows, spread across 1000 locations with 100 possible reading values. In Postgres, every table also exists as a type which represents the table structure. We can see this by executing a completely non-recommended bit of SQL:
DROP TYPE sensor_log;
ERROR: cannot drop type sensor_log because table sensor_log requires it
HINT: You can drop table sensor_log instead.
Given the type already exists, we can write a function which returns rows that masquerade as the sensor_log
table. Here's a very basic function that merely grabs the top ten readings for a particular location:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR)
RETURNS SETOF sensor_log AS
$$
DECLARE
rReturn sensor_log;
BEGIN
FOR rReturn IN
SELECT *
FROM sensor_log
WHERE location = nLocation
ORDER BY reading DESC
LIMIT 10
LOOP
RETURN NEXT rReturn;
END LOOP;
END;
$$ LANGUAGE plpgsql;
SELECT * FROM sensor_location_top_ten('15');
sensor_log_id | location | reading | reading_date
---------------+----------+---------+---------------------
1068015 | 15 | 100 | 2017-02-22 09:17:30
1255015 | 15 | 100 | 2017-01-31 17:50:50
497015 | 15 | 100 | 2017-04-29 11:24:10
894015 | 15 | 100 | 2017-03-14 12:37:30
248015 | 15 | 100 | 2017-05-28 07:04:10
1092015 | 15 | 100 | 2017-02-19 14:37:30
183015 | 15 | 100 | 2017-06-04 19:37:30
35015 | 15 | 100 | 2017-06-21 22:44:10
2245015 | 15 | 100 | 2016-10-09 03:50:50
1767015 | 15 | 100 | 2016-12-03 11:37:30
(10 rows)
Break it Down
But what if we want to permanently remove the location column from the results? We can't do that because there's no type with that definition. Of course, we can make one and then use it instead:
CREATE TYPE sensor_log_noloc AS (
sensor_log_id INT,
reading BIGINT,
reading_date TIMESTAMP
);
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR)
RETURNS SETOF sensor_log_noloc AS
$$
DECLARE
rReturn sensor_log_noloc;
BEGIN
FOR rReturn IN
SELECT sensor_log_id, reading, reading_date
FROM sensor_log
WHERE location = nLocation
ORDER BY reading DESC
LIMIT 10
LOOP
RETURN NEXT rReturn;
END LOOP;
END;
$$ LANGUAGE plpgsql;
Julio, Get the Stretch
Alternatively, Postgres functions can define specific OUT
variables. Declared in order, we can use them as a virtual row-set. The major caveat here is that parameter names must not conflict with table column names due to ambiguity concerns. This means we should either rename our parameters, or be particularly meticulous when using table column aliases.
Here's how the function would look if we took the latter approach:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(
nLocation VARCHAR,
sensor_log_id OUT INT,
reading OUT BIGINT,
reading_date OUT TIMESTAMP
)
RETURNS SETOF RECORD AS
$$
BEGIN
FOR sensor_log_id, reading, reading_date IN
SELECT s.sensor_log_id, s.reading, s.reading_date
FROM sensor_log s
WHERE s.location = nLocation
ORDER BY s.reading DESC
LIMIT 10
LOOP
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
It's a bit more work up front, but it removes the necessity of creating a corresponding type for every function that returns row sets.
Saturday Night, and We in the Spot
Of course, saving the best for last is a pretty classic move, and we're suckers for tradition. Beginning in Postgres 8.4, functions gained one final syntax that did two things that effectively deprecated the previous two approaches. First, we no longer have to declare the type before the function. Second, we can avoid polluting our argument list with return values.
Here's how it looks:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR)
RETURNS TABLE(
sensor_log_id INT,
reading BIGINT,
reading_date TIMESTAMP
) AS
$$
BEGIN
FOR sensor_log_id, reading, reading_date IN
SELECT s.sensor_log_id, s.reading, s.reading_date
FROM sensor_log s
WHERE s.location = nLocation
ORDER BY s.reading DESC
LIMIT 10
LOOP
RETURN NEXT;
END LOOP;
END;
$$ LANGUAGE plpgsql;
What's interesting about this approach is that it's otherwise identical to using OUT
parameters. The principal differences are the previously mentioned reduction in parameter clutter, and that a table-returning function implicitly returns a set of records. In essence, it's basically just OUT
-parameter shorthand someone hacked into the Postgres engine.
Don't Believe Me, Just Watch
Postgres functions have a plethora of such conveniences, and like most maturing platforms, it accumulates more with every passing generation. Since Postgres 10 is shaping up to be a rather comprehensive amalgam of varied enhancements, it's exciting to see how functions might evolve. Even if they remain static until some mysterious future version, there's still a lot more for us to explore.
And we most definitely will.
The stuff you’re doing with PL/pgsql really doesn’t require that language. SQL will do, as in:
CREATE OR REPLACE FUNCTION sensor_location_top_ten(nLocation VARCHAR)
RETURNS TABLE(
sensor_log_id INT,
reading BIGINT,
reading_date TIMESTAMP
)
LANGUAGE sql
AS $$
SELECT s.sensor_log_id, s.reading, s.reading_date
FROM sensor_log s
WHERE s.location = nLocation
ORDER BY s.reading DESC
LIMIT 10
$$;
Great example! Using SQL is a good way to simplify in some cases.
I still prefer to use PLPGSQL in most cases. Keep in mind that these examples are general purpose to illustrate the concept. For every function that is compatible with straight SQL, there is another that needs an exception block, or contains multiple phases that don’t really map well to a single query.
Oh, and if you insist on PL/pgsql for whatever reason, you can use RETURN QUERY.
Thanks for the article!
Small typo: “just useR a cursor”
Thanks for the catch, Nikolay! I swear I read that particular sentence ten times and somehow my eyes skipped right over that. Doh!