Transactions Are Not Locks
Posted on
One thing I wish I had understood better earlier on in my experience with PostgreSQL is how transactions and locks can be used together to provide serializable logic.
An easy way to illustrate this is with a simple bank account system. Suppose we
create an accounts
table and populate it like this:
create table accounts (
name text primary key,
balance int not null
);
insert into accounts (name, balance) values ('A', 10), ('B', 0);
Now we have two bank accounts, A
with a balance of $10, and B
with a balance
of $0.
In order to be a useful bank, we want to be able to move money from one account to another. In pseudocode, the way to move money from one account to another might look something like:
function moveMoney(from, to, amount):
# Start a transaction.
txn = db.begin()
# Update the balances.
txn.execute('update accounts set balance = balance - $amount where name = $from')
txn.execute('update accounts set balance = balance + $amount where name = $to')
# Commit the transaction.
txn.commit()
We use a transaction here to make sure that either both updates succeed, or both
updates fail. In other words, we want to avoid the situation where money is
deducted from A
but never deposited to B
.
There’s another situation that we might want to avoid in our bank too: we might
want a rule that account balances can never be negative. To enforce this rule,
we can update our moveMoney
function:
function moveMoney(from, to, amount):
# Moving a negative amount of money from A to B is equivalent to moving the
# corresponding positive amount from B to A.
if amount < 0:
moveMoney(to, from, -1*amount)
return
# Start a transaction so that all of our queries/updates succeed or fail as a
# unit.
txn = db.begin()
# Make sure the $from account has a balance of at least $amount.
currBalance = txn.query('select balance from accounts where name = $from')
if currBalance < amount:
txn.rollback()
throw exception
# Move the money as before.
txn.execute('update accounts set balance = balance - $amount where name = $from')
txn.execute('update accounts set balance = balance + $amount where name = $to')
# Commit the transaction.
txn.commit()
But there’s a problem with this! Using a transaction only ensures that all of the writes succeed or fail together, it does not provide any guarantees that all of the statements in the transaction execute “at the same time” (i.e. the transactions are not serializable).
Preventing concurrency bugs
Let’s simulate two different actors calling moveMoney('A', 'B', 10)
concurrently, again with A
having an initial balance of $10 and B
having $0:
Actor 1 | Actor 2 |
---|---|
begin |
|
select balance from accounts where name = 'A' |
|
begin |
|
select balance from accounts where name = 'A' |
|
update accounts set balance = balance - 10 where name = 'A' |
|
update accounts set balance = balance + 10 where name = 'B' |
|
commit |
|
update accounts set balance = balance - 10 where name = 'A' |
|
update accounts set balance = balance + 10 where name = 'B' |
|
commit |
Now, if we check the account balances, we can see a problem:
postgres=# select * from accounts ;
name | balance
------+---------
A | -10
B | 20
Both actors read the initial balance as $10, and therefore allowed the
operations to proceed. The transaction is ensuring that $10 is deducted from A
if and only if $10 is deposited into B
, but two transactions can still be
reading and making decisions based on the same data concurrently.
(PostgreSQL by default does not allow two transactions to write the same
data concurrently; after Actor 1 updates A
’s balance, Actor 2 isn’t able to
update A
’s balance until after the first transaction is committed or rolled
back.)
check
constraints
There are a few ways we can fix this. One way would be to add a check constraint:
alter table accounts add constraint nonnegative_balance check (balance >= 0);
With this constraint, Actor 2’s update
will fail because the constraint would
be violated. In fact, we would no longer even need to check the previous balance
in our application code at all, because the database itself would ensure no
account’s balance ever goes below zero.
Table locks
Another approach would be to use a lock. Before we start reading or writing data
from the accounts
table, we can use a lock to ensure that our transaction has
exclusive access to that table until we roll back or commit:
begin;
+lock table accounts;
select balance from accounts where name = 'A';
update accounts set balance = balance - 10 where name = 'A';
update accounts set balance = balance + 10 where name = 'B';
commit;
The lock table accounts
statement will not finish until no other transactions
have any locks on the accounts
table, and will prevent all other transactions
from accessing the accounts
table until our transaction is committed or rolled
back.
Row locks
Locking the entire accounts table is an effective way to prevent overdrawing an
account, but it also needlessly slows down our banking program. If someone is
trying to move money from A
to B
while someone else is trying to move money
from B
to C
, the second person’s transaction won’t be able to start until
the first transaction completes, even though they’re touching different
accounts.
Luckily, rather than acquiring a lock on the entire table, we can just acquire a
lock on the row that we’re deducting money from. To do this, we can use for update
at the end of our select
statement:
select balance from accounts where name = 'A' for update;
Now, other transactions won’t be able to read this row until our transaction is
committed or rolled back (for update
can only be used inside a transaction).
Transaction isolation levels
One other way to ensure that we don’t overdraw an account is to change the isolation level of the transaction:
begin transaction isolation level serializable;
select balance from accounts where name = 'A';
update accounts set balance = balance - 10 where name = 'A';
update accounts set balance = balance + 10 where name = 'B';
commit;
The PostgreSQL manual has a good description of serializable
:
If a pattern of reads and writes among concurrent serializable transactions would create a situation which could not have occurred for any serial (one-at-a-time) execution of those transactions, one of them will be rolled back with a
serialization_failure
error.
Where a row or table lock would prevent a second transaction from reading the
balance until the previous transaction committed, with isolation level serializable
the second transaction would immediately fail with an error
message: “could not serialize access due to concurrent update.”
There’s a good explanation of the “serializable” consistency model—and how it differs from other models—on the Jepsen site.