Version Vector(II)
Pratik Pandey
Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com
In my last article, we saw how Distributed Data Stores use Version Vector to identify concurrent updates to data records. We looked at one of the techniques of identifying concurrent updates/conflicts by leveraging ClientId as an Actor & the advantages and disadvantages of doing so. In this article, we’ll look at another approach for identifying concurrent updates/conflicts.
Server As An?Actor
The problem with Server as an Actor is that of Actor Explosion, as the number of clients can grow to a very high number. To solve that, we can leverage servers as actors.
But, you can ask, we can have very large clusters as well, across multiple regions and that might face the same problem of Actor Explosion.?
Yes, You’re right! Hence, we define servers as the number of nodes defined by the replication factor. If you remember, Each data record is tied to Version Vectors & hence for each data record, the maximum size of the version vectors will be the replication factor for the data in that cluster.
Let’s try to understand what’s happening in the above diagram -
领英推荐
Advantage -
Concerns -
This brings us to the end of this article. Server As An Actor is definitely a promising approach to avoid the explosion problem, but the server’s still a proxy to the clients which are actually performing the operations. Hence, it also suffers from issues based on different approaches, where without siblings, we can have data loss/updates and with siblings, we don’t have a way to track causality in the merged state.
In our next article, we’ll cover the best approach to solving this problem. So stay tuned!
Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter?for notifications of new posts.
Please comment on the post with your feedback, will help me improve! :)
Until next time, Keep asking questions & Keep learning!