Repeatable read isolation level in SQL

Solves non repeatable read problem from read committed isolation level

Aug 30, 2023

In a previous blog post on isolation levels in SQL, we had discussed that the read committed isolation level suffers from non-repeatable reads problem. In this short post, we’ll discuss how repeatable read isolation level solves this problem to guarantee consistency across different read operations in a single transaction.

Why does this occur and potential solutions?

Let’s take the same example from the previous post. We have two SQL sessions - S1 and S2.

S1> start transaction;

S1> select * from test;
+------+------+
| col1 | col2 |
+------+------+
|    1 |   12 |
|    2 |  100 |
|    3 |  999 |
+------+------+

S2> update test set col2 = 10 where col1 = 1;

S1> select * from test;
+------+------+
| col1 | col2 |
+------+------+
|    1 |   10 |
|    2 |  100 |
|    3 |  999 |
+------+------+

Once the second session(S2) updates the value, there’s no way for the first session(S1) to know what value it read previously, and it reads the value that S2 had committed.

So when S1 started a transaction if it could save a snapshot of all the values that it wants to read, we could read the same value multiple times consistently. However, we do not always know what rows we will need to read while starting a transaction.

If we could prevent S2 from updating the rows that S1 has read until S1 has committed the transaction, we can guarantee that S1 will always read the same value for the rows that it wants to read.

Solution

Repeatable read isolation level uses the later solution to solve the non-repeatable read problem. When S1 reads the first row, it acquires a read/shared lock on that row. This type of lock can be shared between multiple sessions to ONLY read the values of the same row.

If a session wants to update the row, it needs to acquire a write/exclusive lock on that row. Multiple write locks can not be taken on the same row. This lock can not be acquired when a read lock is already acquired on the desired row and vice-versa.

Since S1 has acquired a read lock on the first row, when S2 tries to update the same row it will be blocked until either S1 releases the lock or the request times out.

S1> start transaction;

// Acquires a read lock on col1 = 1
S1> select * from test where col1 = 1;
+------+------+
| col1 | col2 |
+------+------+
|    1 |   12 |
+------+------+

// Will be blocked - can't acquire a write lock since S1 has a read lock on the same row.
S2> update test set col2 = 10 where col1 = 1;

S1> select * from test where col1 = 1;
+------+------+
| col1 | col2 |
+------+------+
|    1 |   12 |
+------+------+

S1> commit;

// Will update the matching rows since the read lock is now released by S1.
S2> update test set col2 = 10 where col1 = 1;

This solves the repeatable read problem but still suffers from phantom read issues and potential deadlocks.

Phantom reads problem

Suppose we have 2 SQL sessions - S1 and S2 with the following statements.

S1> start transaction;

S1> select * from test where col1 = 1;

S1> select count(*) from test;
+----------+
| count(*) |
+----------+
|       3  |
+----------+

S2> insert into test values (4, 50);

S1> update test set col2 = 10 where col1 = 1;

S1> select count(*) from test;
+----------+
| count(*) |
+----------+
|       4  |
+----------+

Since we are locking only the rows that are read or updated, another session S2 can still insert rows that will affect queries like count(*) in some range. This makes the same query in a session produce inconsistent results. We may get extra rows while querying in a range or get different counts as shown in the example above. This issue is known as the phantom reads problem.

Potential deadlocks

Repeatable read isolation level also suffers from deadlock conditions. One such condition arises when two transactions read the same row and acquire the read lock on the same resource. Now when both of the transactions try to update the row, they will not be able to acquire a write lock since a read lock is been acquired by the other transaction, resulting in a deadlock.

S1> start transaction;
  > select * from test where col1 = 1;  // read lock

S2> start transaction;
  > select * from test where col1 = 1;  // read lock

S1> update test set col2 = 10 where col1 = 1;  // waiting for S2 to release the read lock

S2> update test set col2 = 10 where col1 = 1;  // waiting for S1 to release the read lock

...

Both transactions are waiting for the other one to release the read lock on col1 = 1, resulting in a deadlock. These queries will eventually timeout as defined by the underlying SQL engine.

We discussed that the repeatable read isolation level still suffers from phantom read problems and trivial deadlock conditions. However, in the modern MySQL engine, this is not the case. MySQL uses InnoDB as the storage engine which doesn’t simply use the row-locking mechanism, but other techniques to avoid such problems and increase concurrency.

Thank you for reading Rediblog. This post is public so feel free to share it.

Rediblog