PostgreSQL anti-patterns: read-modify-write cycles
Shaun Thomas’s recent post about client-side loops as an SQL anti-pattern is well worth a read if you’re relatively new to SQL-based application development.
It’s reminded me of another SQL coding anti-pattern that I see quite a lot: the naïve read-modify-write cycle. Here I’ll explain what this common development mistake is, how to identify it, and options for how to fix it.
Imagine your code wants to look up a user’s balance, subtract 100 from it if doing so won’t make it negative, and save it.
It’s common to see this written as three steps:
SELECT balance FROM accounts WHERE user_id = 1; -- in the application, subtract 100 from balance if it's above -- 100; and, where ? is the new balance: UPDATE accounts SET balance = ? WHERE user_id =1;
and everything will appear to work fine to the developer. However, this code is critically wrong, and will malfunction as soon as the same user is updated by two different sessions at the same time.
Imagine two concurrent sessions, each subtracting 100 from the user’s balance, starting with an initial value of 300.
Session 1 | Session 2 |
---|---|
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) |
Whoops!. The balance is 200, but you took out 200 from a starting point of 300. So 100 has just vanished.
Most testing and development is done on standalone servers running single sessions, so unless you’re doing rigorous testing this sort of thing often doesn’t get noticed until production, and can be painful to debug. It’s important to know about it so you can code defensively.
Don’t transactions prevent this?
Following along
If you want to try this or any of the other examples in this article, just run the following setup code:
CREATE TABLE accounts (user_id integer primary key, balance integer not null); INSERT INTO accounts(user_id, balance) VALUES (1, 300);
then run the commands in each session column in separate psql sessions by using two terminal windows**.
I often have people on Stack Overflow ask things to the tune of “Don’t transactions prevent this?”. Unfortunately, while great, transactions aren’t magic secret sauce you can add for easy concurrency. The only way to let you completely ignore concurrency issues is to LOCK TABLE every table you might use before starting the transaction (and even then you have to always lock in the same order to prevent deadlocks).
In a transaction the outcome in this case is exactly the same:
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
COMMIT; | COMMIT; |
Solutions
Thankfully PostgreSQL (and SQL in general) has a few tools that will help you, and there are some application side options too. Some popular solutions to this problem are:
- Avoiding the read-modify-write with a calculated update
- Row level locking with SELECT ... FOR UPDATE
- Use of SERIALIZABLE transactions
- Optimistic concurrency control, otherwise known as optimistic locking
Avoiding the read-modify-write cycle
The best solution is often to just do the work in SQL, avoiding the read-modify-write-cycle entirely.
Just write:
Session 1 | Session 2 |
---|---|
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; (sets balance=200) | |
UPDATE accounts SET balance = balance - 100 WHERE user_id = 1; (sets balance=100) |
This works even if the two statements are in concurrent transactions, because the first takes a lock on the row and the second waits on the lock until the first commits or rolls back. The transaction isolation documentation covers this in more detail under READ COMMITTED.
This option is only viable for simpler cases, though. It isn’t useful if the application needs to do some complex logic based on the current balance to determine if the update should proceed, for example.
This method is usually the simplest and fastest choice where it’s applicable.
Note that in the non-default SERIALIZABLE isolation it prevents the error, but does so differently. See the discussion of SERIALIZABLE below, and the documentation.
Row level locking
The simplest solution to fix an existing broken application with the fewest changes is generally to add row level locking.
Instead of SELECT balance FROM accounts WHERE user_id = 1 write SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE. This takes a row-level write lock. Any other transaction that tries to UPDATE the row or SELECT ... FOR UPDATE (or FOR SHARE) it will pause until the transaction that holds the lock rolls back or commits.
In the example above and in PostgreSQL’s default transaction isolation level the second SELECT would simply not return until the first transaction ran its UPDATE then COMMITed. Then the first transaction would continue, but the SELECT would return 200 instead of 300, so the correct value would be produced.
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1 FOR UPDATE; (gets stuck and waits for transaction 1) | |
UPDATE balance SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
COMMIT; | |
(second transaction’s SELECT returns 200) | |
UPDATE balance SET balance = 100 WHERE user_id = 1; (200 – 100 = 100) | |
COMMIT |
The explicit locking chapter in the documentation discusses this in more detail.
Note that this only works if the read-modify-write cycle is contained within a single transaction, as locks only exist for the lifetime of a transaction.
SERIALIZABLE transactions
If the read-modify-write cycle is always contained within a single transaction and if you’re running PostgreSQL 9.1 or newer, you can use a SERIALIZABLE transaction instead of explicit SELECT ... FOR UPDATE.
In this case both SELECTs would proceed normally, as would both UPDATEs. The first transaction would COMMIT fine. However, when you went to COMMIT the second transaction it would instead abort with a serialization error. You would then re-run the failed transaction from the beginning*:
Session 1 | Session 2 |
---|---|
BEGIN ISOLATION LEVEL SERIALIZABLE; | BEGIN ISOLATION LEVEL SERIALIZABLE; |
SELECT balance FROM accounts WHERE user_id = 1; (returns 300) | |
SELECT balance FROM accounts WHERE user_id = 1; (also returns 300) | |
UPDATE accounts SET balance = 200 WHERE user_id = 1; (300 – 100 = 200) | |
UPDATE accounts SET balance = 200 WHERE user_id = 1; (gets stuck on session1’s lock and doesn’t proceed) | |
COMMIT – succeeds, setting balance=200 | |
(UPDATE continues, but sees that the row has been changed and aborts with a could not serialize access due to concurrent update error) | |
COMMIT converted into forced ROLLBACK, leaving balance unchanged |
SERIALIZABLE isolation can force applications to repeat a lot of work if a big transaction aborts or if conflicts are common. It’s very useful for complex cases where attempting to use row locking might just cause deadlocks, though.
The documentation on concurrency control and transaction isolation is the best resource for information on serializable isolation. If you’re not used to thinking about concurrency issues be prepared to read it a few times and be sure to try out the examples.
Optimistic concurrency control
Optimistic concurrency control (or “optimistic locking”) is usually implemented as an application-side method for handling concurrency, often by object relational mapping tools like Hibernate.
In this scheme, all tables have a version column or last-updated timestamp, and all updates have an extra WHERE clause entry that checks to make sure the version column hasn’t changed since the row was read. The application checks to see if any rows were affected by the UPDATE and if none were affected, treats it as an error and aborts the transaction.
For this demo you need to add a new column:
ALTER TABLE accounts ADD COLUMN version integer NOT NULL DEFAULT 1;
Then the example above becomes:
Session 1 | Session 2 |
---|---|
BEGIN; | BEGIN; |
SELECT balance, version FROM accounts WHERE user_id = 1; (returns 1, 300) | |
SELECT version, balance FROM accounts WHERE user_id = 1; (also returns 1, 300) | |
COMMIT; | COMMIT; |
BEGIN; | BEGIN; |
UPDATE accounts SET balance = 200, version = 2 WHERE user_id = 1 AND version = 1; (300 – 100 = 200. Succeeds, reporting 1 row changed.) | |
UPDATE accounts SET balance = 200, version = 2 WHERE user_id = 1 AND version = 1; (300 – 100 = 200). Blocks on session 1’s lock. | |
COMMIT; | |
(UPDATE returns, matching zero rows because it sees version=2 in the WHERE clause) | |
ROLLBACK; because of error detected |
Because it’s fiddly to code, optimistic concurrency control is usually used via an ORM or query building tool.
Unlike SERIALIZABLE isolation, it works even in autocommit mode or if the statements are in separate transactions. For this reason it’s often a good choice for web applications that might have very long user “think time” pauses or where clients might just vanish mid-session, as it doesn’t need long-running transactions that can cause performance problems.
Optimistic concurrency control can co-exist with traditional locking based approaches if you use triggers to enforce the optimistic concurrency locking rules; like this. Nonetheless it’s generally used instead of other explicit concurrency control methods.
Which to use?
The appropriate choice depends on what you’re doing, your scalability requirements, how easy it is to write retry loops to re-run failed transactions, etc.
There is no right answer for everybody. If there was, there’d only be one way to do it, not several.
There is, however, a definitively wrong way, and that’s to ignore concurrency issues and expect the database to just take care of them for you.
Future topics
There’s lots more to watch out for.
- “Testing” connections to avoid adding error handling and retry code
- Home brew queueing implementations
- Sending email / calling web services etc from the DB
- The many ways to get upsert (insert-if-exists-update) or insert-if-not-exists wrong
- … and plenty more
I’ll discuss some of them in later weeks.
* The topic of transaction re-tries brings me to an anti-pattern I want to discuss later: testing connections and writing cannot-afford-to-fail transactions.
** Or if you’re really keen you can turn them into automated tests using PostgreSQL’s isolationtester, in src/test/isolation
Some sql queries are incorrect, “update balance” but must be “update accounts”, fix it please.
Thanks for checking. I’d done so by the time I saw your comment, but I appreciate your taking a look.
A small omission in one of the queries: Under “Row level locking” the second session’s select needs FOR UPDATE as well, otherwise it’ll go right ahead and return the value.
Thanks. I knew I should’ve written isolationtester scripts first, then converted them.
Writing SQL in HTML tables is a special experience I’d prefer not to repeat. Ever. Again.
Very nice, Craig. This is definitely something I don’t see addressed often.
This can also be addressed particularly through stored procedures, but that would take a much longer post. 🙂
Whats wrong with using the following?
UPDATE accounts SET balance = balance – 100 WHERE user_id = 1;
If you wanted the balanced returned after this, you can add ‘RETURNING balance’ on the end (obv the 100 would be a param and not a static number).
As covered under “Avoiding the read-modify-write cycle”, in simple cases that’s just fine, and usually the best choice where it’s possible.
If your predicate logic is more complex and requires application interaction (say “update the balance only if they have enough to cover the request in their account and they send a correct SMS verification token”), not so much.
That’s why it’s important to understand all the options. Different concurrency tools are suitable for different problems.
These are all contrived examples because a real world example would be complex enough that most of the energy would be spent explaining the example, not the problem and solutions.
There is also the option of adding a condition on the UPDATE statement to verify the original known value of the balance so that the record is only modified if the original value has not changed:
— eg
UPDATE accounts SET balance = 200 WHERE user_id = 1 AND balance = 300;
— or
UPDATE accounts SET balance = balance – 100 WHERE user_id = 1 AND balance = 300;
The query will return a count of either 0 or 1 records updated so it is simple to confirm whether the update has been executed and (possibly) refresh the client-side copy of the record and retry if no update took place.
This solution requires no special transaction blocks, isolation level or row-level locking and allows for graceful recovery.
Yep, that’s essentially a special case of row versioning where you test every field instead of just a dedicated version column. It’s supported natively by psqlODBC for example, and works OK for small rows.
It tends to fall down a bit with bigger rows. It can also be problematic in cases where you care about any change to the row, even if a sequence of changes might return it to the original state you first saw.
If you use JDBC connection pooling, I guess you have to follow the “Optimistic concurrency control” way as you cannot guarantee to get the same connection for SELECT and UPDATE. Corrrect ?
With JDBC you usually use a connection pool at a layer above the driver, like DBCP or C3P0, or an app server’s built in pool.
These give you a connection as a wrapper object until you return it to the pool. You open a transaction, do the work, and close the connection wrapper, which returns the underlying connection to the pool.
So long as you keep the connection object (and keep the transaction open) across the whole operation then that will be fine.
If your app returns connections to the pool between a SELECT … FOR UPDATE and subsequent UPDATE then yes, that’s useless, because it’ll do the work in two separate transactions; the row lock taken by the FOR UPDATE is cleared when the transaction is rolled back/committed upon returning the connection to the pool (if you don’t do this yourself). If it wasn’t you’d get stuck when you tried to UPDATE from a different connection, so either way it can’t work.
Just keep the connection. If you can’t do that – say, if it’s over user think-time or between separate HTTP requests – then yes, you’ll want to use optimistic concurrency control. That’s why it’s so popular with Java ORMs like Hibernate that’re heavily used with web apps.
Very good article. One of the major benefit for using an ORM tool is probably the optimistic locking mechanism with the “transactional write-behind” flushing mechanism coming closely.
Like you said, SERIALIZABLE gives you the ultimate “pessimistic locking” “lost update” collision detection, but according to Amdahl’s Law the more serial the code the less you can benefit from parallelization.
My all time favorite is Optimistic Locking with an automatic retry mechanism (http://vladmihalcea.com/2013/11/15/optimistic-locking-retry-with-jpa/).
Automatic retry is great when used carefully, but a great way to achieve silent lost-updates when used carelessly. I’ve seen people who don’t reload the original object, or who do so but blindly set its attributes to the modified ones from the last attempt. That’s a great way to silently overwrite changes, completely defeating the purpose of optimistic concurrency control.
Thanks for a clear explanation on these issues. I’m building a web application and this article has helped me to better understand my options.
A better version of Lachlan’s idea (idempotent transaction) would be this:
UPDATE accounts SET balance = balance – 100 WHERE user_id = 1 AND balance >= 100;
This satisfies the constraint part of the update (there has to be money there) as well as updates the balance without actually having to know it. If the update fails, it is because there isn’t enough money left.
Yes I think this solves the problem simply and elegantly IMO.
Great article!
When we do row level locking, and do an SQL update, can we get PostGreSQL to replace the value on disk within the existing written row? So that it doesn’t write a new row and cause the need for future vacuuming? I’ve got fixed width, non-nullable field fact tables, and would like to be able to support updating some values… I don’t care about MVCC, and only have a single-writer of data which is the ETL process. If the readers get some stale data for something recent for a short time, I’m ok with that… I’m more concerned about the handling of scale without having to set page fill percents to waste space in large fact tables… Any thoughts about that?
No, you can’t make PostgreSQL replace the row in-place, because a row locked FOR UPDATE can still be read by concurrent SELECTs. The only time PostgreSQL could safely do this would be if there were no concurrent transactions at all. As far as I know it doesn’t perform that optimisation.
It sounds like you want in-place updates and dirty reads (where concurrent transactions can read not-yet-committed data). PostgreSQL just doesn’t support that. If you’re doing batch updates you could write a new copy of the table then swap it into place, instead, but that’s not much better.
I believe the 3rd method shall be not SERIALIZABLE transactions but REPEATABLE READ. Per this SO discussion http://stackoverflow.com/a/8813018/2073130, your 1st method was for avoiding the default “lost update”, which can be done by all levels of isolation. But for avoiding the other type of “lost update”, REPEATABLE READ is sufficient enough as by definition you can insert another SELECT in the transaction and shouldn’t be able to see changed value from others (see the same SO thread or http://vladmihalcea.com/2014/09/14/a-beginners-guide-to-database-locking-and-the-lost-update-phenomena/).
Very nice article man ! So many devs don’t care about race condition/concurrency…
Can you tell me please if I am true with the following:
I have an ‘users’ table and a Primary Key on ‘id’ field and no unique indexes on ’email’ and ‘name’ fields (must be unique but I don’t want for the purpose of this example)
Now, I must insert a new user but only if they his email and name not already in use by another !
Does the following transaction is ok ? (IT’S NOT A SERIALIZABLE ISOLATION LEVEL)
BEGIN;
INSERT INTO users VALUES (‘[email protected]’,’steve’);
nb = SELECT COUNT(*) FROM users WHERE name=’steve’ OR email=’[email protected]’
if(nb > 1) // A CONCURENT THREAD meantime insert the same email or name
ROLLBACK;
COMMIT;
Can you confirm me that is this example prevent race problems ?
Thanks in advance.
No, it’s still prone to a race where another thread can insert after you check with the SELECT count(*) and before you commit.
Use PostgreSQL 9.5’s INSERT … ON CONFLICT … feature, or use a plpgsql procedure with a BEGIN … EXCEPTION block, per http://stackoverflow.com/q/1109061/398670 and http://stackoverflow.com/q/17267417/398670
hum, I think I miss a crucial point whilst I do this code snippet:
All other transactions will not see any change from a transaction before it commit.
Thanks a lot for your answer.
Cheers!
Definitely unsafe. Some other thread could insert after your SELECT but before your COMMIT or ROLLBACK.
Another solution would be to use an append only design where the balance would always be calculated from a set of debit and credit rows.
I always favour immutable designs for important data. Updates destroy data, an immutable design would only ever permit inserts.
Yes, that’s the preferred way to do a real ledger – base it on an insert-only journal. The ledger here is a simple illustration of the problem, not a practical implementation. Just some convenient table names to illustrate a general problem.
Yes, that’s the ideal design for a ledger but on it’s own does it solve the problem raised here? For e.g. a withdrawal transaction the balance must be calculated and checked before proceeding with the inserts which raises the possibility of concurrency issues that need to be accounted for.
Readers may also want to look at the discussion here: https://news.ycombinator.com/item?id=13227078
Thank you for your great article Craig (y)