登录查看更多内容

Data Consistency in Apache Cassandra?—?Part 1

Rajendra Uppal

Director of Engineering | ex @ Microsoft, Adobe | IIT Delhi

发布日期: 2017年8月13日

For a quick introduction on what Apache Cassandra is, take a look?here. Consistency is a significantly large topic to cover in one part. So I’ll be completing it in 3 parts. This first part defines consistency in general, write consistency, read consistency, consistency levels (CL), immediate, eventual and tunable consistency.

Consistency

The topic and concept of consistency is very important when you work with a distributed database like Cassandra. When you’re working with a database which runs on only one server, consistency is a non-issue. But when you’re running on multiple servers that can span multiple racks and multiple data centres, you can always run into issues where data on one server or data on one replica node is different from data on other replica node. So, what?consistency technically means is that it refers to a situation where all the replica nodes have the exact same data at the exact same point in time.

Consistency Level (CL): is the number of replica nodes that must acknowledge a read or write request for the whole operation/query to be successful.

Write CL?controls how many replica nodes must acknowledge that they received and wrote the partition.

Read CL?controls how many replica nodes must send their most recent copy of partition to the coordinator.

Write Consistency

Write consistency means having consistent data (immediate or eventual) after your write query to your Cassandra cluster. You can tune the write consistency for performance (by setting the write CL as ONE) or immediate consistency for critical piece of data (by setting the write CL as ALL) Following is how it works:

A client sends a write request to the coordinator.
The coordinator forwards the write request (INSERT, UPDATE or DELETE) to all replica nodes whatever write CL you have set.
The coordinator waits for?n?number of replica nodes to respond.?n?is set by the write CL.
The coordinator sends the response back to the client.

Read Consistency

Read consistency refers to having same data on all replica nodes for any read request. Following is how it works:

A client sends a read request to the coordinator.
The coordinator forwards the read (SELECT) request to?n?number of replica nodes.?n?is set by the read CL.
The coordinator waits for?n?number of replica nodes to respond.
The coordinator then merges (finds out most recent copy of written data) the?n?number of responses to a single response and sends response to the client.

领英推荐

How PostgreSQL stores data in files, called forks

Arpit Bhayani 9 个月前

Announcing KubeDB v2022.10.18

AppsCode Inc. 2 年前

Run PostgreSQL in Amazon Elastic Kubernetes Service…

AppsCode Inc. 2 年前

Read CL = ALL gives you immediate consistency as it reads data from all replica nodes and merges them, means keeps the most current data.

Read CL = ONE gives you benefit of speed, Cassandra only contacts one closest/fastest replica node, so throughput of the read request will be lower so performance will be higher. Also, it might so happen that 2 out of 3 replica nodes might be down or query might be failed and you will still get a result because CL = ONE, so you have highest availability. For all these benefits, the price you pay is lower consistency. So, your consistency guarantees are much lower.

Read CL = QUORUM (Cassandra contacts majority of the replica nodes) gives you a nice balance, it gives you high performance reads, good availability and good throughput.

Immediate Consistency vs. Eventual Consistency

So, with consistency in Cassandra, you have two core types of consistency.?immediate consistency?and?eventual consistency.

Immediate consistency:?is having the identical data on all replica nodes at any given point in time.

Eventual consistency:?by controlling our read and write consistencies, we can allow our data to be different on our replica nodes, but our queries will still return the most correct version of the partition data.

What this means is that because we can choose between immediate and eventual consistency, we end up with a system that has?tunable consistency. Tunable Consistency means that you can set the CL for each read and write request. So, Cassandra gives you a lot of control over how consistent your data is. You can allow some queries to be immediately consistent and other queries to be eventually consistent. That means, in your application, the data that requires immediate consistency, you can create your queries accordingly and the data for which immediate consistency is not required, you can optimize for performance and choose eventual consistency.

In next part 2, we will see how to achieve immediate and eventual consistency using different write and read consistency levels.

Comments and thoughts welcome. Cheers!

References:

Waeez .

Co-Founder @ Onegen.ai

1 年

Keshav Maheshwari

Software Engineer @ PayPal | Founder, Breakoutpicker | Ex Intern - Deloitte, Qodrr | BTech Computer Science LNM IIT

2 年

Thanks for this wonderful article sir Just a quick question Cassandra is column oriented . At which place it is fruitful to use column oriented databases?

Rajendra Uppal

Director of Engineering | ex @ Microsoft, Adobe | IIT Delhi

7 年

you may want to take a quick overview on Cassandra here https://www.datastax.com/resources/tutorials/cassandra-overview

1 次回应

查看更多评论

要查看或添加评论，请登录

Rajendra Uppal的更多文章

Data Consistency in Apache Cassandra?—?Part 3

2017年8月13日

Data Consistency in Apache Cassandra?—?Part 3

In part 2, I explained how to achieve immediate and eventual consistency using different write and read consistency…

2 条评论
Reasons to Learn Python

2015年6月14日

Reasons to Learn Python

I am not going to bore you with all the cool features Python has got. This article is aimed at presenting latest trend…

3 条评论

Data Consistency in Apache Cassandra?—?Part 1

Rajendra Uppal

Director of Engineering | ex @ Microsoft, Adobe | IIT Delhi

Consistency

Write Consistency

Read Consistency

领英推荐

Immediate Consistency vs. Eventual Consistency

Rajendra Uppal的更多文章

社区洞察

其他会员也浏览了

Run PostgreSQL in Azure Kubernetes Service (AKS) Using KubeDB

Upgrading from PostgreSQL 9.6 to 17 with pglogical

Leverage Replacing MergeTree for Real-Time PostgreSQL to ClickHouse Sync Using Kafka & Debezium | Hands-On Lab

Postgres for Everything

WHAT IS CASSANDRA

Streaming replication Postgresql (Master and Slave)

MongoDB

Bi-Directional Logical Replication in PostgreSQL 16: CTO's Perspective

#Cassandra single node cluster is operational

How we upgraded PostgreSQL?Major version for multiple mission-critical applications

Consistency

Write Consistency

Read Consistency

领英推荐

Immediate Consistency vs. Eventual Consistency

Rajendra Uppal的更多文章

Data Consistency in Apache Cassandra?—?Part 3

Reasons to Learn Python

社区洞察

其他会员也浏览了

Run PostgreSQL in Azure Kubernetes Service (AKS) Using KubeDB

Upgrading from PostgreSQL 9.6 to 17 with pglogical

Leverage Replacing MergeTree for Real-Time PostgreSQL to ClickHouse Sync Using Kafka & Debezium | Hands-On Lab

Postgres for Everything

WHAT IS CASSANDRA

Streaming replication Postgresql (Master and Slave)

MongoDB

Bi-Directional Logical Replication in PostgreSQL 16: CTO's Perspective

#Cassandra single node cluster is operational

How we upgraded PostgreSQL?Major version for multiple mission-critical applications