Here’s a surprising behavior of PostgreSQL you might not know about: under READ COMMITTED isolation, PostgreSQL can sometimes miss rows when performing UPDATE, DELETE, SELECT FOR UPDATE, or SELECT FOR SHARE statements. This is due to the EvalPlanQual recheck PostgreSQL adds to these statements to prevent lost-update anomalies.
For CockroachDB’s new implementation of READ COMMITTED isolation, we considered building our own version of EvalPlanQual, but decided to use a different technique instead which doesn’t miss rows. By not missing rows, CockroachDB alleviates the need for application-level retries.
Under READ COMMITTED isolation, PostgreSQL adds a special recheck step to UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE statements. This recheck is known as EvalPlanQual, and consists of a re-evaluation of part of the statement after all qualifying rows are locked. PostgreSQL uses EvalPlanQual to prevent lost updates under READ COMMITTED isolation.
To make this concrete, let’s walk through an example.
Suppose we’re using SQL to organize a 3-on-3 basketball league. One of the tables in our schema holds information about every player in the league, including (a) their ID, (b) their name, (c) their skill level, and (d) their team assignment if they have one.
CREATE TABLE player (
id integer NOT NULL,
name text NOT NULL,
level text NOT NULL,
team text,
PRIMARY KEY (id)
);
INSERT INTO player VALUES
(1, 'Gray', 'A', 'Dolphins'),
(2, 'Mohan', 'A', 'Dolphins'),
(3, 'Stonebraker', 'A', 'Dolphins'),
(4, 'Lamport', 'A', 'Gophers'),
(5, 'Ullman', 'A', 'Gophers'),
(6, 'Lynch', 'A', 'Gophers'),
(7, 'Bernstein', 'AA', 'Elephants'),
(8, 'Liskov', 'AA', 'Elephants'),
(9, 'Codd', 'AA', 'Elephants');
When modifying this table, we’ll try to maintain two invariants:
After a wildly successful season, we decide to move all of the Gophers up to skill level AA. We use a single UPDATE statement in an implicit transaction, which should atomically modify all three players and thus maintain our invariants.
UPDATE player SET level = 'AA' WHERE team = 'Gophers';
(But to more easily demonstrate, let’s sleep for 5 seconds at the beginning of the update.)
WITH sleep AS (SELECT pg_sleep(5))
UPDATE player SET level = 'AA' FROM sleep WHERE team = 'Gophers';
While that update is running, the Gophers and Dolphins decide to exchange players 3 and 4 in a trade. Again, we use a single UPDATE statement in an implicit transaction. This should atomically swap the skill level and team of the two players and thus, again, maintain our invariants.
UPDATE player p1 SET (level, team) =
(SELECT level, team FROM player p2
WHERE p2.id IN (3, 4) AND p2.id != p1.id)
WHERE p1.id IN (3, 4);
The second update succeeds immediately. What happens to the first update? Let’s consider the result by both database and isolation level.
In PostgreSQL, under SERIALIZABLE isolation, the first update fails with a “could not serialize” error. During execution, PostgreSQL detects that another transaction has concurrently modified players 3 and 4, and aborts.
Under READ COMMITTED isolation, the first update succeeds, but the result is anomalous: player 3 is now on the Gophers but is still at level A. Only two of the Gophers were moved up to level AA.
michae2=# SELECT * FROM player ORDER BY id;
id | name | level | team
----+--------------+-------+-----------
1 | Gray | A | Dolphins
2 | Mohan | A | Dolphins
3 | Stonebraker | A | Gophers <- still at level A
4 | Lamport | A | Dolphins
5 | Ullman | AA | Gophers
6 | Lynch | AA | Gophers
7 | Bernstein | AA | Elephants
8 | Liskov | AA | Elephants
9 | Codd | AA | Elephants
(9 rows)
This violates invariant 2, which is surprising, because when considered individually both updates should have maintained our invariants.
In CockroachDB, under both SERIALIZABLE and READ COMMITTED isolation the result is the same: the first update succeeds, player 3 is now on the Gophers, and is correctly at level AA.
demo@127.0.0.1:26257/demoapp/defaultdb> SELECT * FROM player ORDER BY id;
id | name | level | team
-----+--------------+-------+------------
1 | Gray | A | Dolphins
2 | Mohan | A | Dolphins
3 | Stonebraker | AA | Gophers <- correctly at level AA
4 | Lamport | A | Dolphins
5 | Ullman | AA | Gophers
6 | Lynch | AA | Gophers
7 | Bernstein | AA | Elephants
8 | Liskov | AA | Elephants
9 | Codd | AA | Elephants
(9 rows)
How does PostgreSQL arrive at its anomalous result, and how does CockroachDB avoid the anomaly? To answer these questions let’s examine execution of the first UPDATE statement under READ COMMITTED isolation in both databases.
We’ll start by looking at EXPLAIN in PostgreSQL. The plan for the first UPDATE statement seems quite simple: a scan of the table followed by a filter.
michae2=# EXPLAIN UPDATE player SET level = 'AA' WHERE team = 'Gophers';
QUERY PLAN
-------------------------------------------------------------
Update on player (cost=0.00..1.11 rows=0 width=0)
-> Seq Scan on player (cost=0.00..1.11 rows=3 width=38)
Filter: (team = 'Gophers'::text)
But that Update on player
operation hides a lot of complexity. We can think
of PostgreSQL executing the update under READ COMMITTED isolation in roughly
seven steps.
player
at the read snapshot. (Seq Scan
in the plan.)team = 'Gophers'
. (Filter
in the plan.) Players 4, 5, and 6
qualify since the read snapshot is before the second update.Update
in the plan.)Seq Scan
in
the plan again.) This picks up the change to player 4.Filter
in the plan
again.) Now only players 5 and 6 qualify.Update
in the plan.)Steps 5 and 6 are PostgreSQL’s strategy for preventing lost updates under READ COMMITTED isolation. After locking all qualifying rows, PostgreSQL re-evaluates the query steps a second time on the latest version of each locked row. This is the EvalPlanQual recheck. By doing this recheck, PostgreSQL picks up any changes made to those rows between reading them and locking them, which in this case correctly prevents modifying the skill level of player 4. Unfortunately, because PostgreSQL only rechecks the locked rows, it misses the change to player 3. We were expecting at least one of those two players to move up to skill level AA, but neither did.
This behavior is documented, but it can still cause surprises.
Next, let’s look at EXPLAIN in CockroachDB. Again, the plan for the UPDATE statement seems quite simple: a scan of the table followed by a filter.
demo@127.0.0.1:26257/demoapp/defaultdb> EXPLAIN UPDATE player SET level = 'AA' WHERE team = 'Gophers';
info
-------------------------------------------------------------------------------------------
distribution: local
vectorized: true
• update
│ table: player
│ set: level
│ auto commit
│
└── • render
│
└── • filter
│ estimated row count: 3
│ filter: team = 'Gophers'
│
└── • scan
estimated row count: 9 (100% of the table; stats collected 8 seconds ago)
table: player@player_pkey
spans: FULL SCAN
Again, that update
operation hides a lot of complexity. We can think of
CockroachDB executing the update under READ COMMITTED isolation in roughly six
steps.
player
at the read snapshot. (scan
in the plan.)team = 'Gophers'
. (filter
in the plan.) Players 4, 5, and 6
qualify since the read snapshot is before the second update.update
in the plan.)Step 6 is CockroachDB’s strategy for preventing lost updates under READ COMMITTED isolation. CockroachDB retries the entire UPDATE statement with a new read snapshot if it encounters a newer version of a row while writing an intent.2 By executing the entire statement again with a new read snapshot, CockroachDB can pick up any changes made after the previous read snapshot, which in this case includes changes to both players 3 and 4. Crucially, CockroachDB holds onto locks and intents across retries, which helps it make progress even during periods of heavy contention.
At first glance these two different techniques for preventing lost updates seem to embody a tradeoff: on the one hand, PostgreSQL avoids retries, on the other hand, CockroachDB avoids anomalies. But if we consider things from the application’s point of view, this is less of a tradeoff than it first appears. What would an application need to do to be sure that all Gophers were updated to skill level AA?
When using PostgreSQL, the application itself would need to retry the first UPDATE statement until all Gophers were at level AA. Application-level retries are more expensive than database-level retries, due to network latency, so this is strictly worse than retrying within the database. There’s no way to avoid retrying the first UPDATE statement in this scenario, but CockroachDB is able to hide the retry from the application.
PostgreSQL adds EvalPlanQual rechecks to UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE statements under READ COMMITTED isolation to prevent lost-update anomalies. EvalPlanQual lets PostgreSQL avoid internal retries of these statements, but it can cause PostgreSQL to miss rows which will in turn require application-level retries to handle.3
In CockroachDB’s new implementation of READ COMMITTED isolation, instead of EvalPlanQual we built an internal statement-retry mechanism that does not miss rows and thus alleviates the need for application-level retries.
1 Intents are new versions of a row that also act as exclusive locks.
2 Under SERIALIZABLE isolation, CockroachDB retries entire transactions when a serializable history cannot be achieved through other means. In this case we’re using a single-statement implicit transaction, so CockroachDB appears to have the same behavior under both SERIALIZABLE and READ COMMITTED isolation. But note that if the scenario were slightly different, say a move of a player to the Gophers rather than a swap, the two isolation levels would show different behavior.
3 This article focuses on PostgreSQL, but it’s worth noting that MySQL (using InnoDB) can also miss rows with some tweaks to the scenario. Of the databases we’ve studied, so far Oracle has the most similar UPDATE behavior under READ COMMITTED isolation to CockroachDB.
Syndication from What Does Write Skew Look Like by Justin Jaffray
This post is about gaining intuition for Write Skew, …
Read more
Editor's Note: This post was originally authored when CockroachDB was pre-1.0. CockroachDB's architecture has undergone …
Read moreIf you’ve ever read the section on transaction isolation in your database documentation, you’ve probably seen some …
Read more