Data Consistency in Apache Cassandra?—?Part 1
For a quick introduction on what Apache Cassandra is, take a look?here. Consistency is a significantly large topic to cover in one part. So I’ll be completing it in 3 parts. This first part defines consistency in general, write consistency, read consistency, consistency levels (CL), immediate, eventual and tunable consistency.
Consistency
The topic and concept of consistency is very important when you work with a distributed database like Cassandra. When you’re working with a database which runs on only one server, consistency is a non-issue. But when you’re running on multiple servers that can span multiple racks and multiple data centres, you can always run into issues where data on one server or data on one replica node is different from data on other replica node. So, what?consistency technically means is that it refers to a situation where all the replica nodes have the exact same data at the exact same point in time.
Consistency Level (CL): is the number of replica nodes that must acknowledge a read or write request for the whole operation/query to be successful.
Write CL?controls how many replica nodes must acknowledge that they received and wrote the partition.
Read CL?controls how many replica nodes must send their most recent copy of partition to the coordinator.
Write Consistency
Write consistency means having consistent data (immediate or eventual) after your write query to your Cassandra cluster. You can tune the write consistency for performance (by setting the write CL as ONE) or immediate consistency for critical piece of data (by setting the write CL as ALL) Following is how it works:
Read Consistency
Read consistency refers to having same data on all replica nodes for any read request. Following is how it works:
领英推荐
Read CL = ALL gives you immediate consistency as it reads data from all replica nodes and merges them, means keeps the most current data.
Read CL = ONE gives you benefit of speed, Cassandra only contacts one closest/fastest replica node, so throughput of the read request will be lower so performance will be higher. Also, it might so happen that 2 out of 3 replica nodes might be down or query might be failed and you will still get a result because CL = ONE, so you have highest availability. For all these benefits, the price you pay is lower consistency. So, your consistency guarantees are much lower.
Read CL = QUORUM (Cassandra contacts majority of the replica nodes) gives you a nice balance, it gives you high performance reads, good availability and good throughput.
Immediate Consistency vs. Eventual Consistency
So, with consistency in Cassandra, you have two core types of consistency.?immediate consistency?and?eventual consistency.
Immediate consistency:?is having the identical data on all replica nodes at any given point in time.
Eventual consistency:?by controlling our read and write consistencies, we can allow our data to be different on our replica nodes, but our queries will still return the most correct version of the partition data.
What this means is that because we can choose between immediate and eventual consistency, we end up with a system that has?tunable consistency. Tunable Consistency means that you can set the CL for each read and write request. So, Cassandra gives you a lot of control over how consistent your data is. You can allow some queries to be immediately consistent and other queries to be eventually consistent. That means, in your application, the data that requires immediate consistency, you can create your queries accordingly and the data for which immediate consistency is not required, you can optimize for performance and choose eventual consistency.
In next part 2, we will see how to achieve immediate and eventual consistency using different write and read consistency levels.
Comments and thoughts welcome. Cheers!
References:
Co-Founder @ Onegen.ai
1 年??
Software Engineer @ PayPal | Founder, Breakoutpicker | Ex Intern - Deloitte, Qodrr | BTech Computer Science LNM IIT
2 年Thanks for this wonderful article sir Just a quick question Cassandra is column oriented . At which place it is fruitful to use column oriented databases?
Director of Engineering | ex @ Microsoft, Adobe | IIT Delhi
7 年you may want to take a quick overview on Cassandra here https://www.datastax.com/resources/tutorials/cassandra-overview