登录查看更多内容

DB Isolation Levels Part -2 (Repeatable Read and Serializable)

Tushar Goel

Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead

发布日期: 2022年7月11日

In the last article, we learned about Read un-committed and Read committed isolation levels and the problems linked with them. In this, we will learn about Repeatable Read and Serializable isolations.

Repeatable Read / Snapshot isolation

Repeatable read is the most common solution for the problems we faced in reading committed isolation levels. It is by default choice for MySQL.

In this, each transaction reads from a snapshot of the database at the start of the transaction. I.e. all the transactions that were committed till the point of time. After that, any changes done by other transaction(s) will not be visible.

Each transaction makes a snapshot of the data and each modification created a version of it. It will be helpful when a transaction is a rollback. This technique is called multi-version concurrency control (MVCC). Ex: Let’s start with 2 different transactions:

mysql> select @@transaction_isolation;

+-------------------------+

| @@transaction_isolation |

+-------------------------+

| REPEATABLE-READ ? ? ? ? |

+-------------------------+

1 row in set (0.00 sec)


mysql> select @@autocommit;

+--------------+

| @@autocommit |

+--------------+

|? ? ? ? ? ? 0 |

+--------------+

1 row in set (0.00 sec)

mysql> begin;

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec)

mysql> select @@transaction_isolation

+-------------------------+

| @@transaction_isolation |

+-------------------------+

| REPEATABLE-READ ? ? ? ? |

+-------------------------+

1 row in set (0.00 sec)


mysql> select @@autocommit;

+--------------+

| @@autocommit |

+--------------+

|? ? ? ? ? ? 0 |

+--------------+

1 row in set (0.00 sec)

mysql> begin;

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec)

Let’s update one row from T1:

mysql> update account set opening_balance = 200 where acnumber='A0004'

Query OK, 1 row affected (0.00 sec)

Rows matched: 1? Changed: 1? Warnings: 0


mysql> select * FROM ACCOUNT;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0002? ? | C0002? | B0002 | ? ? ? ? ? ? 400 | 2012-06-12 | Saving | Active? |

| A0004? ? | C0004? | B0004 | ? ? ? ? ? ? 200 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

2 rows in set (0.01 sec);

mysql> begin

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where opening_balance > 400;

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

1 row in set (0.01 sec);

You can see that even if a row is updated in T1, we still see a consistent view of a row. We don’t see any non-repeatable rows. That’s how it maintains Read consistency. Now try to update some rows from the T2 transaction:

mysql> update account set opening_balance = 200 where acnumber='A0004'

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction;

You see, this statement is timed out because T1 took a lock on it. Even if we try some other row then the result will be the same.

mysql> update account set opening_balance = 200 where acnumber='A0002'

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

mysql> SHOW ENGINE INNODB STATUS\G

------------

TRANSACTIONS

------------

mysql tables in use 1, locked 1

LOCK WAIT 2 lock struct(s), heap size 1128, 2 row lock(s)

MySQL thread id 8, OS thread handle 6164082688, query id 64 localhost root updating

update account set opening_balance = 700 where acnumber='A0002'

------- TRX HAS BEEN WAITING 29 SEC FOR THIS LOCK TO BE GRANTED:

RECORD LOCKS space id 2 page no 4 n bits 80 index GEN_CLUST_INDEX of table `test`.`account` trx id 2832 lock_mode X waiting;

That’s how to write is protected in this isolation.

How it works:

In general, when a transaction started, it is given a unique 64-bit continually increasing transaction id (txid). Whenever a transaction writes anything, the data it corresponds to is tagged with writer txid.

Each row in a table has a created_by field that stores the txid that inserts this row into the table. Also, each row has a deleted_by field, which is initially empty. If a transaction deletes a row, it actually does not delete it immediately but instead marks the delete_by column with the txid. Later when GC happens in DB, any row marked by the deleted_by column will be deleted and free their space.

When a transaction started, it keeps a copy of all the ongoing transactions that are in progress (not yet committed or aborted), Any younger transactions (based on trx id). InnoDB makes sure that any changes made by these transactions (insert, update or delete) will be ignored. It will read-only transactions that were committed before this transaction started.

But this level of isolation has some issues as well. Like:

Lost update
Phantom row and Write skew

1. Lost Update:?

So far we have seen that for long-running read queries, this isolation provides consistent reading. Now, what happens when 2 writes happening on the same row in different transactions?

Both, the transactions are having their own snapshot so when they are committing then each one of them modifies other data. We can’t guarantee in which order they will so update done by 1 transaction can be overwritten by other. It is not useful in many cases like counter update, field update, etc..

To solve this we can use the following strategies:

1. 1. Atomic write operations:?

Generally when we have updated some columns, we need to first read them, modify them and then write them back. Problem with this approach is just after reading, someone modifies the value and that will cause data corruption.?To avoid such scenarios many DBs are providing atomic write operations. It blocks that particular row using explicit locking that prevents any other transaction to update?anything till the lock is released.

mysql> update account set opening_balance = 200 where acnumber='A0004';

1.2. Explicit locking

In some cases where Atomic locking is not suitable, we can use explicit locking as well. In this case, we will lock the rows that we are expecting to modify such that no other transactions can update them.

begin

select? * from account where opening_balance < 1000 FOR UPDATE;

update account set astatus='Block' where opening_balance < 1000;

mysql> begin

Query OK, 0 rows affected (0.00 sec)


mysql> select * from account where astatus='Active';

+----------+--------+-------+-----------------+------------+--------+---------+

| acnumber | custid | bid ? | opening_balance | aod? ? ? ? | atype? | astatus |

+----------+--------+-------+-----------------+------------+--------+---------+

| A0002? ? | C0002? | B0002 | ? ? ? ? ? ? 700 | 2012-06-12 | Saving | Active? |

| A0004? ? | C0004? | B0004 |? ? ? ? ? ? 2000 | 2020-09-11 | Saving | Active? |

+----------+--------+-------+-----------------+------------+--------+---------+

2 rows in set (0.00 sec)


mysql> UPDATE account set atype='saving' where astatus='Active';

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction;

As we can see we have not used any primary in T1 so it blocked the whole table. No data can be inserted into the table at this point. So to improve the performance we should use the primary key or unique key in the query.

领英推荐

Protect .NET String values by hiding them into machine…

Skater NET Obfuscator 1 年前

5 Don’ts When Corruption is Detected in SQL Server…

Stellar Information Technology Pvt. Ltd. 2 年前

Cache Eviction Policies

Sumit Kar 11 个月前

1.3 Automatically detecting lost updates

In this transactional manager detects a lost update, aborts the transaction, and force It is to retry its read-modify-write cycle. In this approach, DB can perform this check efficiently along with snapshot isolation. Many DBs like PostgreSQL repeatable read isolation support this feature but MySQL InnoDB doesn’t support it.

1.4 Compare-and-set

In this approach,? DB allows updates only when the value of a field hasn’t changedSince the last read. If the value doesn’t match then the update has no effect and?Read-modify-write Cycle must be retried. Consider an example when a field tries to modify by 2 users concurrently. We can try something like this:

Update account set astatus=’Block’ where acnumber=’A0004’ and astatus=’Active’;

2. Phantom row and Write Skew

In simple terms, a phantom read happens when the same query run twice within a transaction sees different results. MySQL avoids phantom rows in the Repeatable Read isolation level. How it does I will explain in a different article. Let’s understand about write skew now.

Let’s understand this with an example of user signup. Two users try to signup at the same time on example-site.com. They have used the same user-id for registration.

Now, as 2 different transactions are started and they took a snapshot at that time and both see a particular user-id doesn’t exist and then try to create the same user-id without any error. We now have 2 same ids for 2 different users.

And if we see it closely it is not wrong, it is the behavior of DB that’s been followed correctly. But in our use case, it is wrong.

Another example can be booking a meeting room at the same time. In this case, the same room will be booked for 2 different people.

Such type of anomaly can’t be protected using the REPEATABLE READ isolation level. We need to use a strong isolation level for such cases.

Serializability

Serializable isolation is the strongest isolation level. It guarantees that even though transactions are running parallel, the end result is the same as they have been executed one at a time, serially, without any concurrency. It can be provided by databases using one of the 3 techniques mentioned below:

Literally executing transactions in a serial order
Two-phase locking (2PL)
Serializable snapshot isolation (SSI)

Let’s understand more about 2PL and SSI.

Two-Phase Lock (2PL):?

To prevent dirty writes, DBs generally use locks when 2 transactions concurrently try to write the same object. Lock ensures that the second writer must wait until the first transaction is finished (committed or aborted).

Two-phase locking works on similar lines but has stronger lock requirements. It allows many transactions to read the same object simultaneously as long as no one writing to that object. To write an object exclusive write lock is required.

Actually, 2PL uses 2 types of locks: Shared lock and Exclusive lock.

Whenever any transaction is trying to read some object it should get a Shared lock. Several transactions are allowed to take a shared lock if no exclusive lock has already been taken.

Whenever any transaction (T2) wants to write, it has to take an exclusive lock. It will check whether any other transaction(T1)? has taken a shared/exclusive lock or not. If T2 has taken the shared lock on the same object then T1 has to wait till T2 releases the lock.

If any transaction first reads and then writes an object then it may upgrade its shared lock to an exclusive lock.

Serializable snapshot isolation (SSI)

So far we have seen if we use Serializable isolation then we don’t have good performance due to 2PL implementation or not scalable due to serial execution. On the other hand, if we use Weak isolation that performs well but we will face issues like Non-repeatable read, lost update, phantom reads, and write skew.

So, is there a way we can get the best of both worlds? So, Serializable Snapshot isolation is the one that comes closest to this. It has first described in 2008 and is not by default option in PostgreSQL serializable isolation.

In general, 2PL is based on the pessimistic-based concurrency control that is based on the fact that if anything possibly goes wrong, it is better to wait until a situation is safe again.

SSI is a optmisstic concurrency control. It doesn’t block the system if anything possible goes wrong instead it lets the transaction continue anyway. When the transaction wants to?Commit at that point in time it checks if anything bad happened or not and then decides whether to continue with the transaction or roll back.

As with the Snapshot isolation that we have seen earlier, all reads within a transaction are made from a consistent snapshot of the database. On top of it, SSI adds an algorithm for detecting serialization conflicts among writes and determines which transaction to abort.

Following are the cases that the database needs to consider to check outdated data:

Uncommitted write happened before read:

Before the transaction starts, a transaction maintains a list of transactions (uncommitted) that needs to be ignored. Once the main transaction is about to commit it checks if any of the ignored transaction is committed or not. If yes then it will abort the main transaction.

Write occurs after the read

When transactions search for an index value, it kept an entry that all transaction ids read that value. When any written transaction wants to?commit it will check what other transactions read that row and inform them what they read is outdated (if read transactions are not committed). If the read transaction is committed so write commit has conflicted writes and it will ask to abort then.

That's all for the isolation level. Hope this will explain each of the isolation levels. It is very important to understand our requirements before deciding which isolation to choose from. Let me know if any questions or improvements.

Reference:

Designing Data-Intensive applications

要查看或添加评论，请登录

Tushar Goel的更多文章

DB Isolation Levels in Detail - Part -1

2022年6月27日

DB Isolation Levels in Detail - Part -1

If two transactions don't touch the same data, they can safely run in parallel because they are not dependent on each…

1 条评论
How Kafka is so efficient?

2022年2月20日

How Kafka is so efficient?

What is Apache Kafka? Apache Kafka is a distributed streaming platform that allows you to:: Publish and subscribe to…
Dependency Inversion

2022年1月23日

Dependency Inversion

Dependency ingestion is ‘D’ in the SOLID design principle. It is a guideline that helps design loosely coupled classes.
Law of Demeter (Principle of least knowledge)

2022年1月15日

Law of Demeter (Principle of least knowledge)

In the last article we have discussed Tell, Don’t ask guideline. In this article, we will discuss another guideline…
Tell, Don't ask!

2022年1月8日

Tell, Don't ask!

Few days back I was having a discussion with one of my friends and we were discussing how to reduce coupling between…
Concurrency Pattern: Active object pattern

2020年10月11日

Concurrency Pattern: Active object pattern

This pattern is a type of concurrency design pattern and widely used in applications. Also, it is used to create…
JavaScript: DOM Manipulation

2020年9月6日

JavaScript: DOM Manipulation

In this article, I will explain how to do DOM manipulation using pure JavaScript. We have many other libraries…
The JavaScript EventLoop

2020年8月29日

The JavaScript EventLoop

This post will explain one of the most important concepts of JavaScript i.e.
How to use an index in MongoDB?

2020年8月15日

How to use an index in MongoDB?

In this post, I will explain how to use an index in MongoDB. In an earlier article, I explained how indexes work.

3 条评论

See all articles

DB Isolation Levels Part -2 (Repeatable Read and Serializable)

Tushar Goel

Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead

Repeatable Read / Snapshot isolation

领英推荐

Serializability

Tushar Goel的更多文章

社区洞察

其他会员也浏览了

HackTheBox machine "Appointment"

Storage Wars: Choosing the Right Client-Side Storage Solution

[Postgres] How to deal with long-running transactions

FlameGraphs and eBPF (or perf) to troubleshoot Postgres performance

Comparing Databases??? ??

PostgreSQL remote access configuration

How many TPS can we get from a single Postgres node?

I slowed mysql queries to speed it up

Unlocking the Power of Middleware in Rails: How to Add Custom Middleware

How to write super-uber-mega secure, sql-injection bullet-proof PostgreSQL queries

Repeatable Read / Snapshot isolation

领英推荐

Serializability

Tushar Goel的更多文章

DB Isolation Levels in Detail - Part -1

How Kafka is so efficient?

Dependency Inversion

Law of Demeter (Principle of least knowledge)

Tell, Don't ask!

Concurrency Pattern: Active object pattern

JavaScript: DOM Manipulation

The JavaScript EventLoop

How to use an index in MongoDB?

社区洞察

其他会员也浏览了

HackTheBox machine "Appointment"

Storage Wars: Choosing the Right Client-Side Storage Solution

[Postgres] How to deal with long-running transactions

FlameGraphs and eBPF (or perf) to troubleshoot Postgres performance

Comparing Databases??? ??

PostgreSQL remote access configuration

How many TPS can we get from a single Postgres node?

I slowed mysql queries to speed it up

Unlocking the Power of Middleware in Rails: How to Add Custom Middleware

How to write super-uber-mega secure, sql-injection bullet-proof PostgreSQL queries