Version Vector(III)

Version Vector(III)

In my last?article, we looked at one of the techniques of identifying concurrent updates/conflicts by leveraging Server(Replica) as an Actor & the advantages and disadvantages of doing so. If we went with the approach to include siblings in our Version Vectors to resolve conflicts, we still don’t have a way to track causality in the merged state. If we don’t include siblings in our Version Vectors, we can have data loss/updates. In this article, we’ll look at another approach for identifying concurrent updates/conflicts.

Dotted Version Vectors

Dotted Version Vectors solve the problems we saw with using ServerId. Let’s go over the problem with using ServerId -

No alt text provided for this image

Let’s try to understand what’s happening in the above diagram -

  1. Let’s assume we have a key K, with value U. We’re assuming that we have an empty version vector, to begin with. Client’s C2 and C3 sync the same state from the Replica(Assuming all clients are interacting with the same replica) that’s implementing Version Vectors.
  2. C2 updates the value to W & sends a PUT command, with the local state of Version Vector it has(empty VV).
  3. C3 updates the value to V & sends a PUT command, with the local state of Version Vector it has(empty VV).
  4. Replica A receives the request from C3 first(C2’s request might be delayed because of network latency). Replica A compares the Version Vector it received with its local state & sees that they match. So it increments the counter to 1 & updates the value to V. It also sends back the state to C3.
  5. The request from C2 finally arrives at Replica A. Replica A compares the Version Vector it received with its local state & sees that the vector it received does not match its state. So it increments its counter to 2, and adds the value from C2 as a sibling to V. It also sends back the state to C2.
  6. C2 updates the value to Z & sends a PUT command, with the local state of Version Vector it has which is (A, 1).
  7. Replica A receives the request from C3 and compares the Version Vector it received with its local state & sees that they do not match. So it increments its counter to 3, and adds the value Z from C3 as a sibling to V,W.

Problem with Step 7 -

If you notice, the update to Z from V, came after client 3 saw the state of V and decided to update it to Z. However, the server/replica, treats the update as concurrent, rather than sequential and stores the value of Z as a sibling. Hence we end up losing information on causality & that’s because the server does not have context at K : {(A, 2)} : [W, V], the version vector associated with the occurrence of V i.e (A,1).

Solution

We understand that the problem is because we don’t have context that can help us determine, which siblings are causally related and which are concurrent. To solve the problem, we change the structure of what’s stored inside the Version Vector, to provide more context.

Current State — [(A, 2)], [V, W]
New State — [(A, 2), [(A, 1)->V, (A,2)->W]

Now, let’s apply our new state and see how to helps -

No alt text provided for this image

Let’s try to understand what’s happening in the above diagram, starting from step 5, where Replica has already received the 1st message from C3 -

  1. The request from C2 arrives at Replica A. Replica A compares the Version Vector it received with its local state & checks compares each of the “dots” in its local state, to see if any could have been causally related to the incoming Version Vector. In this case, the local state of {A,1} is not related to {} & hence the event is treated as concurrent and its stored as a sibling. K : {(A, 2)} : [{(A, 2), W}, {(A, 1), V}].
  2. C2 updates the value to Z & sends a PUT command, with the local state of Version Vector it has which is (A, 1).
  3. Replica A receives the request from C3 and compares the Version Vector it received with its local state & checks compares each of the “dots” in its local state. It notices that it has a “Dot” that represents {A,1} and hence it knows that its not a concurrent update. Replica A then overwrites the value of V with Z and stores the state of Z along with the latest VV state.

Advantages -

  1. We’re able to solve actor explosion in using Client as an Actor and are able to identify causality across events using “Dotted” Version Vectors.
  2. Easy to understand most recent data based on the counter.

Disadvantages -

  1. Storing additional metadata to provide more contexts introduces extra overheads in syncing the data across multiple replicas.


Conflict Resolutions

Irrespective of the type of Version Vectors we used, we saw that we still had conflicting cases, which we avoided by leveraging siblings. Siblings allow us to defer the conflict resolution and allows the system to move forward and also helps in keeping the state of the conflicting versions till the desired Conflict Resolution Strategy kicks in. We’ll talk about multiple conflict resolution strategies here -

  1. Last Write Wins?-

We referred to Last-Write-Wins briefly during our?article. We generally leverage some parameter(generally timestamp) to decide our latest write and delete any siblings that occurred before the lastest write. The LWW is a great strategy for server side resolution(i.e the responsibility of doing conflict resolution doesn’t fall on the client), but suffers from inconsistent behaviours leading to data inaccuracy.

If you take into account multiple replicas, if multiple clients update the same data row concurrently, because of n/w latency or partition the last write will overwrite the value from the client & this behaviour will not be consistent.

Choose LWW for your systems preferably if you have low concurrent writes, or where you’re okay with the inconsistent behaviour.

2. Read Repair -

Read repair is a strategy where the VV conflict resolution is done at the time of reading the data. The node coordinating the read request fetches the VV of the data from all replica nodes having it, and then performs a merge on the Version Vectors.

Notice that read repair only targets doing conflict resolution for data that's actively being read. For data that's not actively touched, we can do a similar process and its called proactive repair.

There are other conflict resolution mechanism as well, but the idea was to provide all you guys with the high level details. The internals of conflict resolution will change based on the system’s choice of conflict resolution strategy.

-----------------------------

This brings us to the end of this article and the series on Version Vectors. Hopefully it helped you understand how concurrent updates in a distributed system are handled with the help of Version Vectors. We also talked about how Version Vectors evolved and the advantages and drawbacks of each evolved version. Please post comments on any doubts you might have and will be happy to discuss them! Also, if you want me to cover Conflict Resolution in detail, please comment and I’ll do the same!

-----------------------------

Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter?for notifications of new posts.

Please comment on the post with your feedback, will help me improve! :)

Until next time, Keep asking questions & Keep learning!

要查看或添加评论,请登录

Pratik Pandey的更多文章

社区洞察

其他会员也浏览了