登录查看更多内容

Version Vector(II)

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

发布日期: 2022年8月23日

In my last article, we saw how Distributed Data Stores use Version Vector to identify concurrent updates to data records. We looked at one of the techniques of identifying concurrent updates/conflicts by leveraging ClientId as an Actor & the advantages and disadvantages of doing so. In this article, we’ll look at another approach for identifying concurrent updates/conflicts.

Server As An?Actor

The problem with Server as an Actor is that of Actor Explosion, as the number of clients can grow to a very high number. To solve that, we can leverage servers as actors.

But, you can ask, we can have very large clusters as well, across multiple regions and that might face the same problem of Actor Explosion.?

Yes, You’re right! Hence, we define servers as the number of nodes defined by the replication factor. If you remember, Each data record is tied to Version Vectors & hence for each data record, the maximum size of the version vectors will be the replication factor for the data in that cluster.

Let’s try to understand what’s happening in the above diagram -

Let’s assume we have a key K, with value U. We’re assuming that we have an empty version vector, to begin with. Client’s C2 and C3 sync the same state from the Replica(Assuming all clients are interacting with the same replica) that’s implementing Version Vectors.
C2 updates the value to W & sends a PUT command, with the local state of Version Vector it has(empty VV).
C3 updates the value to V & sends a PUT command, with the local state of Version Vector it has(empty VV).
Replica A receives the request from C3 first(C2’s request might be delayed because of network latency). Replica A compares the Version Vector it received with its local state & sees that they match. So it increments the counter to 1 & updates the value to V.
The request from C2 finally arrives at Replica A. Replica A compares the Version Vector it received with its local state & sees that the vector it received does not match it’s state. The following approaches can be taken —

Replica A can ignore the request from C2, as its local version vector is higher than the incoming version vector from C2.
Replica A can update the value to W and counter to 2. This way, we end up serializing requests from C2 & C3 and applying Last-Write-Wins strategy.
Replica A can store both the values V & W as siblings & increment the counter to 2. There is again a risk of Siblings' explosions here, and unlike the last two options, we’re leaving the conflict resolution between the two values to be done later!?

领英推荐

HUAWEI AFRICA CONNECT 2024 | Building Leading AI-Ready…

Huawei IT Products & Solutions 8 个月前

RAID 1 & RAID 10

Richard Wadsworth 2 个月前

Low Latency in Rust with Lock-Free Data Structures

Luis Soares 8 个月前

Advantage -

Does not suffer from Actor Explosion, like the Client As An Actor approach.

Concerns -

Potential missing data if sequential write requests where the local version vector is higher, can be ignored.
The potential issue of Sibling Explosion if siblings are being stored. Also, as siblings grow in size, there would be performance issues during reconciliation.
If siblings are being stored, no way to track causality in the merged state. Eg — K?: {(A, 2)}?: W, V cannot tell me which update V/W happened before which.

This brings us to the end of this article. Server As An Actor is definitely a promising approach to avoid the explosion problem, but the server’s still a proxy to the clients which are actually performing the operations. Hence, it also suffers from issues based on different approaches, where without siblings, we can have data loss/updates and with siblings, we don’t have a way to track causality in the merged state.

In our next article, we’ll cover the best approach to solving this problem. So stay tuned!

Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter?for notifications of new posts.

Please comment on the post with your feedback, will help me improve! :)

Until next time, Keep asking questions & Keep learning!

Distributed Systems Made Easy

7,972 位关注者

要查看或添加评论，请登录

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

2024年5月29日

Database Intermediate Series: Change Data Capture(II)

Our previous post discussed Change Data Capture and how to implement it using triggers. In this post, we’ll explore how…

1 条评论
Database Intermediate Series: Change Data Capture(I)

2024年4月23日

Database Intermediate Series: Change Data Capture(I)

Change Data Capture (CDC) refers to identifying and capturing changes made to data in a database and then delivering…

2 条评论
Database Intermediate Series: SQL Isolation Levels Internals

2024年4月4日

Database Intermediate Series: SQL Isolation Levels Internals

In our last post, we talked about Database Isolation Levels and how different Isolation Levels allow us to balance the…

1 条评论
Database Basics Series: Understanding SQL Isolation Levels

2024年3月21日

Database Basics Series: Understanding SQL Isolation Levels

We are starting a new series on Databases, covering Basic, Intermediate, and Advanced concepts. This is the first…

6 条评论
Go Concurrency Series: Concurrency Patterns(II)

2024年2月3日

Go Concurrency Series: Concurrency Patterns(II)

In our last post, we talked about the Worker Pool and Pipeline concurrency patterns, that we can use while designing…

1 条评论
Go Concurrency Series: Concurrency Patterns

2024年1月23日

Go Concurrency Series: Concurrency Patterns

Let’s continue being a little more hands-on in our Go Concurrency Series! In this post, we’ll look into the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(III)

2024年1月20日

Go Concurrency Series: Deep Dive into Go Scheduler(III)

In my previous posts in the Go Concurrency Series, I’ve gone into the different components of the Go Scheduler and…
Go Concurrency Series: Deep Dive into Go Scheduler(II)

2024年1月14日

Go Concurrency Series: Deep Dive into Go Scheduler(II)

In my last post, we covered the components inside the Go Scheduler, and how a Go Scheduler can orchestrate the…

1 条评论
Go Concurrency Series: Deep Dive into Go Scheduler(I)

2024年1月4日

Go Concurrency Series: Deep Dive into Go Scheduler(I)

In my last post about Goroutines, we talked about how Goroutines differ from Traditional threads. The Go Runtime…

8 条评论
Go Concurrency Series: Introduction to Goroutines

2023年12月25日

Go Concurrency Series: Introduction to Goroutines

Concurrency is a fundamental concept in modern software development, enabling programs to handle multiple tasks…

4 条评论

See all articles

Version Vector(II)

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

Server As An?Actor

领英推荐

Distributed Systems Made Easy

7,972 位关注者

Pratik Pandey的更多文章

社区洞察

其他会员也浏览了

Kafka based system with and without zero copy

JBOD Data Recovery: A Comprehensive Guide

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

Time in Databases

Next: How SOMEIP and SOMEIP-SD Work

Optimizing ClickHouse Performance: Navigating Common Configuration Pitfalls

?? Server to Fetch Data!

Microsoft Data Platform News 2024 - Week 22

Mastering NetScaler Console: Zero to hero in 31days - Day 9

SSOT – the holy grail?

Server As An?Actor

领英推荐

Distributed Systems Made Easy

7,972 位关注者

Pratik Pandey的更多文章

Database Intermediate Series: Change Data Capture(II)

Database Intermediate Series: Change Data Capture(I)

Database Intermediate Series: SQL Isolation Levels Internals

Database Basics Series: Understanding SQL Isolation Levels

Go Concurrency Series: Concurrency Patterns(II)

Go Concurrency Series: Concurrency Patterns

Go Concurrency Series: Deep Dive into Go Scheduler(III)

Go Concurrency Series: Deep Dive into Go Scheduler(II)

Go Concurrency Series: Deep Dive into Go Scheduler(I)

Go Concurrency Series: Introduction to Goroutines

社区洞察

其他会员也浏览了

Kafka based system with and without zero copy

JBOD Data Recovery: A Comprehensive Guide

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

Time in Databases

Next: How SOMEIP and SOMEIP-SD Work

Optimizing ClickHouse Performance: Navigating Common Configuration Pitfalls

?? Server to Fetch Data!

Microsoft Data Platform News 2024 - Week 22

Mastering NetScaler Console: Zero to hero in 31days - Day 9

SSOT – the holy grail?