Multi-Version Concurrency Control

Multi-Version Concurrency Control

If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched article every week delivered straight to your inbox.


Why do we need Concurrency control?

What is the problem?

Let’s imagine that you are trying to transfer money(let’s say 100Rs) from your account to your friend’s account. At the database level, this transfer of money will be done via a database write transaction that debits money from your account and at the same time credits money to your friend’s account.

Imagine that while this transaction is in progress, to be more specific, consider the exact point in time ( let’s say time instance: T ) when the money has been debited from your account but not credited to your friend’s account. If someone tries to read your account balance and your friend’s account balance at the time instance T, it will appear to the reader as if money has disappeared from the bank, since it has left your account but not reached your friend’s account.

This problem arises due to inconsistent reads, which means that although the write transaction was not fully done i.e. crediting the 100Rs money to your friend’s account, other read transactions were allowed to read the inconsistent and uncommitted data from the database.

Thanks for reading Curious Engineer! Subscribe for free to receive new posts and support my work.

Subscribed

How do we solve concurrency issues?

To prevent these scenarios, we need concurrency control in our transactions. Concurrency control enables which data should or should not be visible to other transactions while reading and writing data into the database.

To solve our original problem of inconsistent reads, the simplest way here is to block all the readers until the writer is done which is usually referred to as the read-write lock. In simple terms, a read-write lock states:

“Multiple threads can read the data in parallel but an exclusive lock is needed for writing or modifying data.” In other words, when a writer thread is writing something, all other write threads and read threads must wait until the original writer is done writing.

However, locks are known to provide resource contention. Imagine if multiple writes come at the same time, then multiple write threads will be blocked until one of the write threads is writing the data.

So, the main question becomes: How can we provide maximum database concurrency while getting minimum lock contention?

Enters the solution: Multi-Version Concurrency Control

Multi-Version Concurrency Control (MVCC)

The premise of multi-version concurrency control is:

“When an MVCC database needs to update a piece of data, it will not overwrite the original data item with new data, but instead creates a newer version of the data item, thus storing multiple versions of the same data.“

MVCC tries to maximize database throughput by keeping multiple copies of each data item. Every client connected to the database sees a state (or snapshot) of the database as it was some time ago and then performs an update on top of it. Any changes made by a writer thread are not visible to other users until the changes are completed (as in, the transaction has been committed). This prevents transactions from viewing inconsistent data produced by concurrent transactions performing updates on the same data rows, providing transaction isolation for each database session.

Does MVCC need locks at all?

A good question at this juncture would be since we are storing multiple versions of the same data, then do we need locks at all if we choose multi-version concurrency control over traditional locks?

The answer is that multi-version concurrency control helps: reads not blocking writes and writes not blocking reads but MVCC still needs locks to function.

Consider a case where two write transactions concurrently read data, try to modify the data, and then try to commit their changes. If both transactions are working on the same set of data, then only the latest changes made by a transaction would be visible and the other transaction changes would be lost forever. These are called Lost Updates.

So, in a nutshell, MVCC still needs locks, for example to prevent write conflicts.
Note: Multi-version concurrency control is a popular technique and is used in many databases. For example: PostgreSQL, CouchDB, MySQL with InnoDB storage, etc.

Snapshot Isolation Level

Unlike other traditional isolation levels present in databases, there is an isolation level called Snapshot isolation. A transaction that is operating under the snapshot isolation level will operate on a personal snapshot of the database which is taken at the time of starting the transaction.

When the transaction is concluded, it will successfully commit the changes only if the values updated by the transaction have not been changed by other transactions running concurrently since the time snapshot was taken. And, if there is a write-write conflict among transactions for any values, the transaction will be aborted.

Snapshot isolation level is implemented with multi-version concurrency control and is adopted by multiple databases including InterBase, Firebird, Oracle, MySQL, PostgreSQL, SQL Anywhere, MongoDB, and Microsoft SQL Server.

What is the Row-versioning technique?

Snapshot isolation uses a concept called row-versioning which is a database technique used to track changes made to individual rows in a table. It essentially creates a version history for each row, allowing the database to identify the current state and past states of the data. Here's a deeper look at how it works:

  1. Each row in a table gets an additional column that stores a version number or timestamp. This column is often named rowversion (SQL Server) or uses system columns like xmin and xmax (PostgreSQL).
  2. Whenever a row is inserted, updated, or deleted, the version information associated with that row is incremented or updated accordingly.

Row versioning is the secret sauce of how MVCC works, which allows multiple transactions to access and modify data concurrently with minimal locking. Transactions can read older versions of the data based on their version numbers.

The row-versioning technique is a very basic concept of the database and it helps in implementing Optimistic Concurrency Control as well.

Conclusion

The main advantage of using multi-version concurrency control rather than traditional locking( read-write lock ) is that MVCC locks acquired for reading do not block writing and MVCC locks acquired for writing do not block reading and thus provide high throughput out of your database.


That’s it, folks for this edition of the newsletter. Please consider liking and sharing with your friends as it motivates me to bring you good content for free. If you think I am doing a decent job, share this article in a nice summary with your network. Connect with me on Linkedin or Twitter for more technical posts in the future!

Book exclusive 1:1 with me here.

Thanks for reading Curious Engineer! Subscribe for free to receive new posts and support my work.

Resources

Multi-Version Concurrency Control by Wikipedia

Getting started with MVCC

Snapshot Isolation by Wikipedia


Susanta Ghosh

VP @J P Morgan | Aspiring Architect | Software engineering | Distributed Systems | Java , spring boot, Cloud native Microservices | GenAI | ElasticSearch | NoSQL | Distributed DB [dynamodb,couchbase] | AI agents | RAG

10 个月

While this is excellent, I think whatever you have mentioned as snapshot isolation that's actually Serializable Snapshot Isolation [SSI] which provides prevention of Lost Updates while Snapshot Isolation provides prevention of Non Repeatable Reads.

要查看或添加评论,请登录

Vivek Bansal的更多文章

  • How to implement a Circuit Breaker

    How to implement a Circuit Breaker

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    7 条评论
  • How to implement Consistent Hashing

    How to implement Consistent Hashing

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Optimistic Locking Implementation

    Optimistic Locking Implementation

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

  • 1 year to Curious Engineer ??

    1 year to Curious Engineer ??

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

  • Message Queues vs Message Brokers

    Message Queues vs Message Brokers

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    4 条评论
  • Introduction to gRPC

    Introduction to gRPC

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    7 条评论
  • Non-Functional Requirements

    Non-Functional Requirements

    Brief Introduction Let’s say you are building a website that allows users to book flight tickets. The requirements for…

    4 条评论
  • QuadTrees

    QuadTrees

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    2 条评论
  • Text Based Search: ElasticSearch

    Text Based Search: ElasticSearch

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    3 条评论
  • Sharding vs Partitioning

    Sharding vs Partitioning

    If you like the free content I put out, consider subscribing to my newsletter on substack to get a well-researched…

    5 条评论

社区洞察

其他会员也浏览了