登录查看更多内容

NOSQL + Cassandra's Architecture

Abhijeet K

Program Management | Engagement Manager | AWS AI and Cloud Certified | SAFe 5.0 | PGDBA | Certified Scrum Master | Project Management | Product management | Automation | Performance | CI CD | Agile | DevOps

发布日期: 2019年2月27日

Cassandra is a peer-to-peer distributed database that runs on a cluster of homogeneous nodes. Cassandra has been architected from the ground up to handle large volumes of data while providing high availability. Cassandra provides high write and read throughput. A Cassandra cluster has no special nodes i.e. the cluster has no masters, no slaves or elected leaders. This enables Cassandra to be highly available while having no single point of failure.

Key Concepts :

Data Partitioning Stores data by dividing data evenly around its cluster of nodes. Each node is responsible for part of the data. The act of distributing data across nodes is referred to as data partitioning

Consistent Hashing Determining a node on which a specific piece of data should reside on. Minimising data movement when adding or removing nodes

Data Replication Replication of data ensures fault tolerance and reliability.

Eventual Consistency Nodes/replicas will eventually return the last updated value

Gossip Protocol discover node state for all nodes in a cluster. Nodes discover information about other nodes by exchanging state information about themselves and other nodes they know about. State information about every node propagates throughout the cluster

Bloom Filters - Fast way to test the existence of a data structure in a set. A bloom filter can tell if an item might exist in a set or definitely does not exist in the set. False positives are possible but false negatives are not. Bloom filters are a good way of avoiding expensive I/O operation.

Merkle Tree - A hash tree which provides an efficient way to find differences in data blocks. Leaves contain hashes of individual data blocks and parent nodes contain hashes of their respective children. This enables efficient way of finding differences between nodes.

SSTable - A Sorted String Table (SSTable) ordered immutable key value map. It is basically an efficient way of storing large sorted data segments in a file.

Write Back Cache - A write back cache is where the write operation is only directed to the cache and completion is immediately confirmed. This is different from Write-through cache where the write operation is directed at the cache but is only confirmed once the data is written to both the cache and the underlying storage structure.

Memtable - A memtable is a write back cache residing in memory which has not been flushed to disk yet.

Cassandra Keyspace - Keyspace is similar to a schema in the RDBMS world. A keyspace is a container for all your application data. When defining a keyspace, you need to specify a replication strategy and a replication factor i.e. the number of nodes that the data must be replicate too.

Column Family - A column family is analogous to the concept of a table in an RDBMS. But that is where the similarity ends. Instead of thinking of a column family as RDBMS table think of a column family as a map of sorted map. A row in the map provides access to a set of columns which is represented by a sorted map. Map<RowKey, SortedMap<ColumnKey, ColumnValue>> Please note in CQL (Cassandra Query Language) lingo a Column Family is referred to as a table.

Row Key - A row key is also known as the partition key and has a number of columns associated with it i.e. a sorted map as shown above. The row key is responsible for determining data distribution across a cluster.

Cassandra Cluster/Ring

Cassandra Write Path

Cassandra Read Path

Happy designing :) Abhijeet K.

要查看或添加评论，请登录

Abhijeet K的更多文章

Harnessing Real-Time Data with Apache Kafka: A Game Changer for Modern Banking!

2024年6月17日

Harnessing Real-Time Data with Apache Kafka: A Game Changer for Modern Banking!

In today’s fast-paced financial world, ensuring the security and efficiency of transactions is paramount. we’ve taken a…
The Future of Wealth and Personal Banking: Integration is the Key to Success

2024年5月20日

The Future of Wealth and Personal Banking: Integration is the Key to Success

As we move further into the digital age, the landscape of wealth and personal banking is evolving at an unprecedented…

1 条评论
Employees Don't Quit Their Job; They Quit Their Boss!

2022年12月11日

Employees Don't Quit Their Job; They Quit Their Boss!

Employees might join companies, but they leave managers. Too many managers view their position as one of entitlement…
The Global Race for Real-Time Payments

2021年6月3日

The Global Race for Real-Time Payments

Superpower – Customer Experience/Engagement (CX/CE) Special Weapon – A Digital Engagement platform QR code Payment Need…
Open Banking - The future of Banking

2021年3月12日

Open Banking - The future of Banking

Redefine Business Model Impact of Open Banking on Consumer Financial Health Benefits Of Open Banking Consumer & Small…
Important Lessons from Harshad Mehta life

2020年12月7日

Important Lessons from Harshad Mehta life

1. Nothing is more important than Family 2.
AWS Lambda & Serverless Applications

2019年3月20日

AWS Lambda & Serverless Applications

AWS Lambda is an event-driven, serverless computing platform provided by Amazon as a part of the Amazon Web Services…
Microservices Architecture and Design Best Practices

2019年2月27日

Microservices Architecture and Design Best Practices

Consider separating data storage: Data should be made private to each of the microservices. Microservice becomes the…
Emotional Intelligence - Success tool

2019年1月14日

Emotional Intelligence - Success tool

The idea--that an ability to understand and manage emotions greatly increases our chances of success--quickly took off,…
Common Formal Email phrases

2017年5月25日

Common Formal Email phrases

See all articles

NOSQL + Cassandra's Architecture

Abhijeet K

Program Management | Engagement Manager | AWS AI and Cloud Certified | SAFe 5.0 | PGDBA | Certified Scrum Master | Project Management | Product management | Automation | Performance | CI CD | Agile | DevOps

Cassandra Cluster/Ring

Cassandra Write Path

Cassandra Read Path

Abhijeet K的更多文章

社区洞察

其他会员也浏览了

Transforming User Insights: Real-Time Data Analysis with Kafka, Spark, PostgreSQL, Docker and Cassandra

The Neanderthal Guide to 5G data management : meet the open source Dumbo @

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

High Data Ingestion: LeanXcale Dual Interface SQL & NoSQL

Expanding Data Lakes > >>

HDFS Architecture

#bigdata 29e?—?NoSQL with Base, Cassandra and MongDB

Harnessing the Power of Kafka for Real-Time Data Integration: A Dive into Change Data Capture (CDC) ??

Mastering Cassandra: Key Strategies for Scaling and Optimizing Performance

Cassandra Cluster/Ring

Cassandra Write Path

Cassandra Read Path

Abhijeet K的更多文章

Harnessing Real-Time Data with Apache Kafka: A Game Changer for Modern Banking!

The Future of Wealth and Personal Banking: Integration is the Key to Success

Employees Don't Quit Their Job; They Quit Their Boss!

The Global Race for Real-Time Payments

Open Banking - The future of Banking

Important Lessons from Harshad Mehta life

AWS Lambda & Serverless Applications

Microservices Architecture and Design Best Practices

Emotional Intelligence - Success tool

Common Formal Email phrases

社区洞察

其他会员也浏览了

Transforming User Insights: Real-Time Data Analysis with Kafka, Spark, PostgreSQL, Docker and Cassandra

The Neanderthal Guide to 5G data management : meet the open source Dumbo @

Journey To Database World: Part 8 (Column Family Database - Cassandra As Example)

High Data Ingestion: LeanXcale Dual Interface SQL & NoSQL

Expanding Data Lakes > >>

HDFS Architecture

#bigdata 29e?—?NoSQL with Base, Cassandra and MongDB

Harnessing the Power of Kafka for Real-Time Data Integration: A Dive into Change Data Capture (CDC) ??

Mastering Cassandra: Key Strategies for Scaling and Optimizing Performance