This article is the first in a series about migrating from Oracle to PostgreSQL. We will attempt to cover each of the topics of migration in some amount of depth and provide multiple options to the same issues and architectures.
For this first article, we’ll take a look at Oracles’
In Oracle, the
ROWNUM is assigned to each row of a set as the records are returned. It is an integer data type that starts at 1, with monotonically increasing numbers.
This pseudo column applies to the entire row, regardless of the origin of the row. That is, multiple tables joined together in a single result set will only provide a single
ROWNUM. Changing the order of the data also changes the association of the
ROWNUM with the data.
There are several cases where
ROWNUM is used. Let’s start with a simple Oracle example.
SELECT * FROM mytable WHERE ROWNUM <= 5
Which of course will return the first 5 rows in random order. PostgreSQL can solve this problem in several ways.
Consider using the
LIMIT operator like so:
SELECT * FROM mytable LIMIT 5
This has an identical effect and a very similar
EXPLAIN plan to the Oracle offering. This solution can also be used with the
OFFSET keyword. Together these two terms provide efficient pagination through the rows.
You could also use the SQL standard way of doing things.
SELECT * FROM mytable FETCH FIRST 5 ROWS ONLY
This statement would work on any standards compatible SQL engine, and removes any implementation ambiguity.
That solved the trivial example, but let’s dig a bit deeper. Let’s assume that the
ROWNUM value is actually meaningful, and find some ways of working with it.
We could add a rownum to an output by creating a sequence, and then selecting from it on the fly.
CREATE SEQUENCE monotonic_iterator; SELECT nextval('monotonic_iterator') rownum, * FROM mytable; DROP SEQUENCE monotonic_iterator;
However, we run into several problems when trying to use this sequence. For starters, the rownum is not accessible to the
SELECT nextval('monotonic_iterator') rownum, * FROM mytable WHERE rownum %2=0; ERROR: column "rownum" does not exist
When we try to get something useful out of that, we run into a strange anomaly.
SELECT nextval('monotonic_iterator') rownum, * FROM mytable WHERE mod(nextval('monotonic_iterator'),1)=0; rownum | x --------+------ 127 | 1000 129 | 1001 131 | 1002 133 | 1003 135 | 1004 137 | 1005 139 | 1006 141 | 1007 143 | 1008 145 | 1009 147 | 1010 149 | 1011 151 | 1012 153 | 1013 155 | 1014 157 | 1015 159 | 1016 161 | 1017 163 | 1018 165 | 1019 167 | 1020
So, when we call the
nextval() function repeatedly, it does not refer to the first invocation. Instead, it polls the sequence value again. This is obviously not what we intended.
Also, the sequence might have already existed under that name, so creating and destroying it would require some naming gymnastics.
In addition to that, the
ROWNUM can’t be used in a
So, this method does not prove to be very useful after all, or at least comes with some large caveats.
We can add the row number to a series with an aggregate function like this:
SELECT row_number() OVER () ROWNUM, * FROM mytable
This gives us a simple way to add the desired rownum, without invoking some silly counter of all the previous rows, or drawing from a sequence.
This eliminates the first problem, but does not get rid of the issue that
LIMIT are not useful.
Suppose for a moment that you would like to retrieve the even numbered rows for a sampling. This would give the
ROWNUM a bit more meaning than a simple row limiter.
We can gain access to the rownum by wrapping the query as a sub query:
SELECT rownum, x FROM (SELECT row_number() OVER () rownum, * FROM mytable ) b WHERE rownum%2=0; rownum | x --------+------ 2 | 1001 4 | 1003 6 | 1005 8 | 1007 10 | 1009 12 | 1011 14 | 1013 16 | 1015 18 | 1017 20 | 1019 (10 rows)
This gives the expected result. The
LIMIT clause may then be placed in the sub query, or the outer query, depending on the desired results.
SELECT rownum, x FROM (SELECT row_number() OVER () rownum, * FROM mytable ) b WHERE rownum < 5; rownum | x --------+------ 1 | 1000 2 | 1001 3 | 1002 4 | 1003
In most cases, you will want to limit the results as quickly as possible. This has to do with PostgreSQL efficiency, but that my friends, is a story for another column.
Of course Oracle implements some version of the same aggregate functions, and you could do approximately the same thing there also.
But “what about
ROWID?” you ask. Well, that’s different.
ROWID is an indicator in Oracle of the order of rows on disk for a given table.
In PostgreSQL, this is implemented with a page and leaf identifier. The identifier is visible in a query as a pseudo column with the name of “ctid”. You can call this column in a query explicitly by name.
SELECT ctid, xmin, xmax, * FROM mytable
This will give you the approximate location on disk of your data. But be forewarned, the MVCC system of PostgreSQL only guarantees this to be true in your transaction context. If someone modifies that data in a different transaction, the ctid that you received is now obsolete. The data has been modified and moved elsewhere. You should only use this feature for local transactions, for example as a tie breaker in deduplication.
The ctid is not really useful for row ordering, as the
VACUUM process may also reposition the row in the table. If your objective is to sort the rows into data entry order, this detail must be maintained separately in the table as user data.
So, now you know the ins and outs of the ROWNUM and ROWID from the PostgreSQL perspective. Please ask any questions about it in the comment section below. The answers are likely to show up as a modification of the article in order for everyone to profit from the knowledge.