Consistency & Consistency Levels(II) — Distributed Data Stores

Consistency & Consistency Levels(II) — Distributed Data Stores

Before going into this article, I’d advise you to go through the following articles to get the complete context.

  1. Consistency & Consistency Levels
  2. Consistency Models — Strong Consistency Models
  3. Consistency Models — Session & Eventual Consistency Models

Now that we’re on the same page, we have seen before that we could define the application consistency based on different consistency models.

Consistency model?is a contract between a (distributed)system and the applications that run on it. This model is a set of guarantees made by the distributed system so that the client can have expected consistency behaviour across read & write operations.


From a distributed systems perspective, we can also define consistency levels that also end up impacting the consistency guarantees that our application can make.

Consistency Level?is the number of replica nodes that must acknowledge a read/write request to be successful for the entire request to be considered successful.


Replication Factor?(RF) is equivalent to the number of nodes where any write is replicated. So, if I have a 5 node cluster and have set RF to 3, my writes are replicated to 3 of the 5 nodes always.

Let’s take different consistency levels and see the consistency guarantees that our application can offer -

  1. Read CL = ONE & Write CL = ALL

With this setting, we’ll read the data from any one replica, but the write request is considered successful, only if it's acknowledged by all replicas. Since the write is always synced to all replicas, reading from any replica will always show the latest write.

So, this configuration allows us to have a Strongly Consistent(Linearizable) system.

Pros: Read throughput is very high as read happens only on one replica. Strongly Consistent.

Cons: Setting write CL to ALL increases the write latency, impacting the write performance & throughput as the write needs to be synced to all replicas. The availability of the application is also impacted as we always need all replicas to be available for any successful writes. Partitions cannot be tolerated by the system.

2.?Read CL = ALL & Write CL = ONE

With this setting, the write will be replicated to the number of nodes defined by RF but will be considered successful as soon as one replica acknowledges it. The read request is considered successful, only after fetching the data from all replicas & returning the latest record. Since the read is always fetching data from all replicas, and we know that one replica has the latest write, we’ll always see the latest write.

So, this configuration allows us to have a Strongly Consistent(Linearizable) system.

Pros: Write throughput is very high since write is successful as soon as one node acknowledges it. Strongly Consistent.

Cons: Setting read CL to ALL increases the read latency, impacting the read performance & throughput. The availability of the application is also impacted as we always need all replicas to be available for any successful writes. Partitions cannot be tolerated by the system.

3. Read CL = ONE & Write CL = ONE

With this setting, the write will be replicated to the number of nodes defined by RF but will be considered successful as soon as one replica acknowledges it. The read request will hit any one node & will get the data. Since the node might/might not have the data(If the master node went down & if the data was replicated to it or not), we cannot guarantee whether the latest write will be read immediately.

However, once the write is replicated to the replicas defined by RF, we will be able to see the latest write and hence such a system will be Eventually Consistent.

Pros: Write throughput is very high since write is successful as soon as one node acknowledges it. Read throughput is very high as well, as it just needs to fetch data from one node.

Cons: Cannot have Strong Consistency with this configuration.

4.?Read CL = QUORUM & Write CL = QUORUM

With this setting, we’ll be writing the data to a majority of the nodes and we’ll be reading the data from a majority of nodes as well. Hence we are guaranteed that at least 1 node will have the latest data while reading.

So, this configuration looks like it will allow us to have a Strongly Consistent(Linearizable) system. (Not the case always! Read the Partial Writes Section for more info)

Eg: If we have a 9-node cluster, writes will be done to 5 nodes(4 nodes might not have the latest write). Since we need 5 replicas for the read requests as well, even if we read from the 4 nodes that don’t have the latest write, we’ll have 1 node with the latest copy.

Pros: Both write & read throughputs are good(not as bad as reading/writing to all nodes, but not as good as ONE node). Quorum allows a balanced approach, where you don’t need to sacrifice either read or write performance alone.

Cons:?Performance not as well as reading & writing to one node :)

Partial Writes Scenario -

We talked about quorum writes and reads & how it leads to a Strongly Consistent System. However, the caveat here is that your writes are successful. But you must be thinking, if the writes failed, why would it not satisfy the strongly consistent system?

Let’s take the happy path first. Below, we wrote X=1 to a Quorum of nodes in the cluster and then issued a read request from a Quorum of nodes. We see that the read returned the latest value of X.

Read Request from a Quorum of Node 2 & 3 will return the Old Value(1)

Now, let's think about a scenario where the write happens on a quorum of nodes. It succeeds on Node 1 but fails on Node 2, hence marking the write as failed. However, the write done on Node 1 isn’t rolled back(based on our implementation).

Read Request from a Quorum of Node 1& 3 will return the Updated Value(2)

Now, if we issue a read request, and if the request goes to a quorum of Node 2 & Node 3, we get back 1(value before the failed write) as a result. Check the above illustration.

No alt text provided for this image

Now, if we issue a read request, and if the request goes to a quorum of Node 1& Node 3, we get back 2(value after the failed write) as a result. Check the above illustration.

This would also trigger a?Read Repair(Topic for another article) and the value of 2 will be synced to other nodes.

As you can see above, we can get different values for our reads in the case of partial writes to our data store & hence in such a scenario, our data store isn't strongly consistent.

P.S — The assumption from our client is to retry any failed/timed-out operations & in such cases, we can see our data store acting like eventually consistent data stores.


This brings us to the end of the Consistency Series. Hope you learned something new from it. The data store referenced is Cassandra & these are the concepts that Cassandra uses. However, these concepts can be applied to any distributed data store.

Thank you for reading! I’ll be posting weekly content on distributed systems & patterns, so please like, share and subscribe to this?newsletter?for notifications of new posts.

Please comment on the post with your feedback, will help me improve! :)

Until next time, Keep asking questions & Keep learning!

Shreeharsha Voonna

Principal Software Engineer at Oracle Cloud Infra (OCI) | Nutanix | Citrix | Samsung | NITK Surathkal

2 个月

Thanks for great article Pratik. Follow up question: In the Partial Writes Scenario, if write failed to achieve replicating the data to quorum of 2 nodes (succeeds on Node 1 but fails on Node 2), why is write done on Node1 rolled back? Application logic should do it provided it needed strong consistency. If it was not rolled back and handled correctly, is it not incorrect to say that Read CL = QUORUM & Write CL = QUORUM is not helping with strong consistency?

回复
Shashi bhushan

Lead Engineer@Epam| Expert in Generative AI | Building Scalable LLM Applications| Ex MakeMyTrip|AWS Solution Architect| Kubernetes Dev| Distributed Systems Architecture and Design

2 年

Very illustrative great work ??

Pratik Pandey

Senior Software Engineer at Booking.com | AWS Serverless Community Builder | pratikpandey.substack.com

2 年

Please subscribe to my newsletter -?https://www.dhirubhai.net/newsletters/system-design-patterns-6937319059256397824/ Also, you can connect with me/book mock interviews with me on Topmate -?https://topmate.io/pratik_pandey You can follow me on Medium -?https://distributedsystemsmadeeasy.medium.com/subscribe

回复

要查看或添加评论,请登录

Pratik Pandey的更多文章

社区洞察

其他会员也浏览了