DB Isolation Levels Part -2 (Repeatable Read and Serializable)

In the last article, we learned about Read un-committed and Read committed isolation levels and the problems linked with them. In this, we will learn about Repeatable Read and Serializable isolations.

Repeatable Read / Snapshot isolation

Repeatable read is the most common solution for the problems we faced in reading committed isolation levels. It is by default choice for MySQL.

In this, each transaction reads from a snapshot of the database at the start of the transaction. I.e. all the transactions that were committed till the point of time. After that, any changes done by other transaction(s) will not be visible.

Each transaction makes a snapshot of the data and each modification created a version of it. It will be helpful when a transaction is a rollback. This technique is called multi-version concurrency control (MVCC). Ex: Let’s start with 2 different transactions:

T1

mysql> select @@transaction_isolation;

+-------------------------+

| @@transaction_isolation |

+-------------------------+

| REPEATABLE-READ ? ? ? ? |

+-------------------------+

1 row in set (0.00 sec)


mysql> select @@autocommit;

+--------------+

| @@autocommit |

+--------------+

|? ? ? ? ? ? 0 |

+--------------+

1 row in set (0.00 sec)

mysql> begin;

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec)        

T2

mysql> select @@transaction_isolation

+-------------------------+

| @@transaction_isolation |

+-------------------------+

| REPEATABLE-READ ? ? ? ? |

+-------------------------+

1 row in set (0.00 sec)


mysql> select @@autocommit;

+--------------+

| @@autocommit |

+--------------+

|? ? ? ? ? ? 0 |

+--------------+

1 row in set (0.00 sec)

mysql> begin;

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec)        

Let’s update one row from T1:

mysql> update account set opening_balance = 200 where acnumber='A0004'

Query OK, 1 row affected (0.00 sec)

Rows matched: 1? Changed: 1? Warnings: 0


mysql> select * FROM ACCOUNT;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0002? ? | C0002? | B0002 | ? ? ? ? ? ? 400 | 2012-06-12 | Saving | Active? |

| A0004? ? | C0004? | B0004 | ? ? ? ? ? ? 200 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

2 rows in set (0.01 sec);        

T2

mysql> begin

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec);        

You can see that even if a row is updated in T1, we still see a consistent view of a row. We don’t see any non-repeatable rows. That’s how it maintains Read consistency. Now try to update some rows from the T2 transaction:

mysql> update account set opening_balance = 200 where acnumber='A0004'

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction;        

You see, this statement is timed out because T1 took a lock on it. Even if we try some other row then the result will be the same.

mysql> update account set opening_balance = 200 where acnumber='A0002'

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

mysql> SHOW ENGINE INNODB STATUS\G

------------

TRANSACTIONS

------------

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 1128, 2 row lock(s)

MySQL thread id 8, OS thread handle 6164082688, query id 64 localhost root updating

update account set opening_balance = 700 where acnumber='A0002'

------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 2 page no 4 n bits 80 index GEN_CLUST_INDEX of table `test`.`account` trx id 2832 lock_mode X waiting;        

That’s how to write is protected in this isolation.

How it works:

In general, when a transaction started, it is given a unique 64-bit continually increasing transaction id (txid). Whenever a transaction writes anything, the data it corresponds to is tagged with writer txid.

Each row in a table has a created_by field that stores the txid that inserts this row into the table. Also, each row has a deleted_by field, which is initially empty. If a transaction deletes a row, it actually does not delete it immediately but instead marks the delete_by column with the txid. Later when GC happens in DB, any row marked by the deleted_by column will be deleted and free their space.

When a transaction started, it keeps a copy of all the ongoing transactions that are in progress (not yet committed or aborted), Any younger transactions (based on trx id). InnoDB makes sure that any changes made by these transactions (insert, update or delete) will be ignored. It will read-only transactions that were committed before this transaction started.

But this level of isolation has some issues as well. Like:

  1. Lost update
  2. Phantom row and Write skew

1. Lost Update:?

So far we have seen that for long-running read queries, this isolation provides consistent reading. Now, what happens when 2 writes happening on the same row in different transactions?

Both, the transactions are having their own snapshot so when they are committing then each one of them modifies other data. We can’t guarantee in which order they will so update done by 1 transaction can be overwritten by other. It is not useful in many cases like counter update, field update, etc..

To solve this we can use the following strategies:

1. 1. Atomic write operations:?

Generally when we have updated some columns, we need to first read them, modify them and then write them back. Problem with this approach is just after reading, someone modifies the value and that will cause data corruption.?To avoid such scenarios many DBs are providing atomic write operations. It blocks that particular row using explicit locking that prevents any other transaction to update?anything till the lock is released.

mysql> update account set opening_balance = 200 where acnumber='A0004';        

1.2. Explicit locking

In some cases where Atomic locking is not suitable, we can use explicit locking as well. In this case, we will lock the rows that we are expecting to modify such that no other transactions can update them.

T1

begin

select? * from account where opening_balance < 1000 FOR UPDATE;

update account set astatus='Block' where opening_balance < 1000;        

T2

mysql> begin

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where astatus='Active';

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0002? ? | C0002? | B0002 | ? ? ? ? ? ? 700 | 2012-06-12 | Saving | Active? |

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

2 rows in set (0.00 sec)


mysql> UPDATE account set atype='saving' where astatus='Active';

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction;        

As we can see we have not used any primary in T1 so it blocked the whole table. No data can be inserted into the table at this point. So to improve the performance we should use the primary key or unique key in the query.

1.3 Automatically detecting lost updates

In this transactional manager detects a lost update, aborts the transaction, and force It is to retry its read-modify-write cycle. In this approach, DB can perform this check efficiently along with snapshot isolation. Many DBs like PostgreSQL repeatable read isolation support this feature but MySQL InnoDB doesn’t support it.

1.4 Compare-and-set

In this approach,? DB allows updates only when the value of a field hasn’t changedSince the last read. If the value doesn’t match then the update has no effect and?Read-modify-write Cycle must be retried. Consider an example when a field tries to modify by 2 users concurrently. We can try something like this:

Update account set astatus=’Block’ where acnumber=’A0004’ and astatus=’Active’;        

2. Phantom row and Write Skew

In simple terms, a phantom read happens when the same query run twice within a transaction sees different results. MySQL avoids phantom rows in the Repeatable Read isolation level. How it does I will explain in a different article. Let’s understand about write skew now.

Let’s understand this with an example of user signup. Two users try to signup at the same time on example-site.com. They have used the same user-id for registration.

Now, as 2 different transactions are started and they took a snapshot at that time and both see a particular user-id doesn’t exist and then try to create the same user-id without any error. We now have 2 same ids for 2 different users.

And if we see it closely it is not wrong, it is the behavior of DB that’s been followed correctly. But in our use case, it is wrong.

Another example can be booking a meeting room at the same time. In this case, the same room will be booked for 2 different people.

Such type of anomaly can’t be protected using the REPEATABLE READ isolation level. We need to use a strong isolation level for such cases.

Serializability

Serializable isolation is the strongest isolation level. It guarantees that even though transactions are running parallel, the end result is the same as they have been executed one at a time, serially, without any concurrency. It can be provided by databases using one of the 3 techniques mentioned below:

  1. Literally executing transactions in a serial order
  2. Two-phase locking (2PL)
  3. Serializable snapshot isolation (SSI)

Let’s understand more about 2PL and SSI.

Two-Phase Lock (2PL):?

To prevent dirty writes, DBs generally use locks when 2 transactions concurrently try to write the same object. Lock ensures that the second writer must wait until the first transaction is finished (committed or aborted).

Two-phase locking works on similar lines but has stronger lock requirements. It allows many transactions to read the same object simultaneously as long as no one writing to that object. To write an object exclusive write lock is required.

Actually, 2PL uses 2 types of locks: Shared lock and Exclusive lock.

Whenever any transaction is trying to read some object it should get a Shared lock. Several transactions are allowed to take a shared lock if no exclusive lock has already been taken.

Whenever any transaction (T2) wants to write, it has to take an exclusive lock. It will check whether any other transaction(T1)? has taken a shared/exclusive lock or not. If T2 has taken the shared lock on the same object then T1 has to wait till T2 releases the lock.

If any transaction first reads and then writes an object then it may upgrade its shared lock to an exclusive lock.

Serializable snapshot isolation (SSI)

So far we have seen if we use Serializable isolation then we don’t have good performance due to 2PL implementation or not scalable due to serial execution. On the other hand, if we use Weak isolation that performs well but we will face issues like Non-repeatable read, lost update, phantom reads, and write skew.

So, is there a way we can get the best of both worlds? So, Serializable Snapshot isolation is the one that comes closest to this. It has first described in 2008 and is not by default option in PostgreSQL serializable isolation.

In general, 2PL is based on the pessimistic-based concurrency control that is based on the fact that if anything possibly goes wrong, it is better to wait until a situation is safe again.

SSI is a optmisstic concurrency control. It doesn’t block the system if anything possible goes wrong instead it lets the transaction continue anyway. When the transaction wants to?Commit at that point in time it checks if anything bad happened or not and then decides whether to continue with the transaction or roll back.

As with the Snapshot isolation that we have seen earlier, all reads within a transaction are made from a consistent snapshot of the database. On top of it, SSI adds an algorithm for detecting serialization conflicts among writes and determines which transaction to abort.

Following are the cases that the database needs to consider to check outdated data:

Uncommitted write happened before read:

Before the transaction starts, a transaction maintains a list of transactions (uncommitted) that needs to be ignored. Once the main transaction is about to commit it checks if any of the ignored transaction is committed or not. If yes then it will abort the main transaction.

Write occurs after the read

When transactions search for an index value, it kept an entry that all transaction ids read that value. When any written transaction wants to?commit it will check what other transactions read that row and inform them what they read is outdated (if read transactions are not committed). If the read transaction is committed so write commit has conflicted writes and it will ask to abort then.

That's all for the isolation level. Hope this will explain each of the isolation levels. It is very important to understand our requirements before deciding which isolation to choose from. Let me know if any questions or improvements.

Reference:

Designing Data-Intensive applications

要查看或添加评论,请登录

Tushar Goel的更多文章

  • DB Isolation Levels in Detail - Part -1

    DB Isolation Levels in Detail - Part -1

    If two transactions don't touch the same data, they can safely run in parallel because they are not dependent on each…

    1 条评论
  • How Kafka is so efficient?

    How Kafka is so efficient?

    What is Apache Kafka? Apache Kafka is a distributed streaming platform that allows you to:: Publish and subscribe to…

  • Dependency Inversion

    Dependency Inversion

    Dependency ingestion is ‘D’ in the SOLID design principle. It is a guideline that helps design loosely coupled classes.

  • Law of Demeter (Principle of least knowledge)

    Law of Demeter (Principle of least knowledge)

    In the last article we have discussed Tell, Don’t ask guideline. In this article, we will discuss another guideline…

  • Tell, Don't ask!

    Tell, Don't ask!

    Few days back I was having a discussion with one of my friends and we were discussing how to reduce coupling between…

  • Concurrency Pattern: Active object pattern

    Concurrency Pattern: Active object pattern

    This pattern is a type of concurrency design pattern and widely used in applications. Also, it is used to create…

  • JavaScript: DOM Manipulation

    JavaScript: DOM Manipulation

    In this article, I will explain how to do DOM manipulation using pure JavaScript. We have many other libraries…

  • The JavaScript EventLoop

    The JavaScript EventLoop

    This post will explain one of the most important concepts of JavaScript i.e.

  • How to use an index in MongoDB?

    How to use an index in MongoDB?

    In this post, I will explain how to use an index in MongoDB. In an earlier article, I explained how indexes work.

    3 条评论

社区洞察

其他会员也浏览了