Transactions | Weak Isolation Levels & Multi-version Databases

Transactions | Weak Isolation Levels & Multi-version Databases

Introduction

In our previous edition we discussed Strong Isolation Levels in detail. We understood the performance overheads involved and also discussed the Read Committed scheme in detail.

In this edition we will discuss an interesting Weak Isolation scheme called Snapshot Isolation and what are the problems they solve for.


Non Repeatable Read

Non Repeatable Read is yet another problem which could leave our Database in an in-consistent state. This problem can’t be caught under the Read Committed scheme. Let’s understand this issue with an example.

Suppose a person Alex has his money split into two accounts X and Y. Initially, Alex has an amount of Rs. 2000 which is equally divided into two accounts X and Y having a balance of Rs. 1000 each.

Now, Alex decides to transfer Rs. 200 from Account Y to X. A Transaction T1 is initiated to carry on the transfer procedure while at the same time Alex reads the balance from both the accounts.

No alt text provided for this image

The above diagram explains the entire procedure. Due to the bad timing Alex received an incorrect sum of money from both of his accounts (X & Y). This happened because he was inquiring about his account’s balance at the same time when the transfer was happening from Account Y to X.

Alex read the balance from Account X before the amount was transferred from Account Y and hence read the balance amount of Rs. 1000. Afterwards, Alex read the balance from Amount Y post the transfer procedure completed and hence received the amount of Rs. 800.

Summing up the balance from both the Accounts result in the total amount of Rs. 1800, whereas the total amount was supposed to be Rs. 2000. Alex might be wondering where his Rs. 200 went? Although this is a temporary glitch and will get solved when Alex re-tries to fetch the balance from Account X and will receive an amount of Rs. 1200 this time.

This is the Non Repeatable Read problem which was explained in the previous scenario. It is also known as the Read Skew problem.

Note: The Read Skew problem can not be caught by the Read Committed scheme. Since the reads in this scenario happened when the data was in a committed state. While the Read Committed scheme avoids only those reads which are performed on a data-item in an in-consistent state.

In the previous example the problem faced by Alex was temporary but Read Skew can also lead to some permanent issues in the following scenario.

Backups: While taking a backup, we copy all the values from a database. During the backup process, the writes are allowed to be performed on the database. In the above scenario some older-versions of data-items might get backed-up while some newer versions of the data-item can get backed-up in combination. Suppose if we later restore the database from that backup, then our database might be left in an in-consistent state. The problem Alex had in his scenario will get preserved in the database when we decided to apply the back-up.

There is one solution to this problem which is called Snapshot Isolation - A weaker isolation scheme. Let’s understand this in our next section.


Snapshot Isolation Scheme

Snapshot Isolation scheme works on the principle:

”Every transaction reads from a consistent snapshot of the database.”

Every transaction can only see the version of data that was committed at the time the transaction was initiated. Suppose after the start of a transaction any update was made to the data-item, then the transaction won’t be able to see those. This avoids the situation of Non-Repeatable Reads from happening in the first place.


Multi-Version Database

In order to understand the Snapshot Isolation scheme, we need to first understand the concept of Multi-version Databases. Under this, a database maintains several different committed versions of the data-items. The reason is to support the Snapshot Isolation scheme, since multiple transactions might need to see the database at different points in time.

Since the database maintains multiple versions of data-items in parallel, this technique is known as Multi-Version Concurrency Control (MVCC).


Implementing Snapshot Isolation scheme

In this scheme the database preserves multiple copies of the same data-item describing their values in various points of time. Let’s understand what a single data-item looks like.

No alt text provided for this image

The created_by field in the above data-item object stores the Transaction ID that created the version of the data-item.

The deleted_by field in the above data-item object stores the Transaction ID that deleted the version of the data-item. When a data-item is deleted, it is not actually removed from the data-store immediately whereas the deleted_by field is set and is removed at some point later in time when it’s safe to be deleted (not required by any active transaction).

In this scheme an Update operation is translated into Delete followed by a Create operation. When the value of a data-item is updated from X to Y, then the previous version having X is set to be deleted and a newer version having value Y is created.

Let’s understand how this scheme avoids the previous Non-Repeatable Read problem faced by Alex while transferring some amount of money from his one account to another.

No alt text provided for this image

Since now we have multiple versions of the data-items stored in the database, Alex reads the correct balances from his Accounts. While performing a read from Account Y, he reads a balance of Rs. 1000 and not Rs 800, since the later version of the data-item was created by transaction ID 7 and won’t be visible to the transaction reading the data-item with ID 6.

These are the visibility rules followed by the Snapshot Isolation scheme to decide which version of the data-item can be seen by the transactions:

  • Rule 1: Any writes made by the transaction with a later transaction ID (which started after the current transaction) are ignored, regardless of whether those transactions have been committed.
  • Rule 2: The writes performed by aborted transactions are ignored.


Conclusion

We discussed the Snapshot Isolation scheme: one of the Weak Isolation schemes in detail and the problem of Non-Repeatable reads which this scheme solves for. We also discussed the concept of Multi-version databases in detail.

Meanwhile what you all can do is to Like and Share this edition among your peers and also subscribe to this Newsletter so that you all can get notified when I come up with more content in future. Share this Newsletter with anyone who might be benefitted from this content.

Until next time, Dive Deep and Keep Learning!

要查看或添加评论,请登录

Saurav Prateek的更多文章

社区洞察

其他会员也浏览了