登录查看更多内容

Cassandra VS. MongoDB

Eran Shaham

Architect and a Big-Data leader at Ford Research Center Israel LTD

发布日期: 2017年7月3日

Cassandra and MongoDB became to be the two of the most popular NOSQL databases that are running around in the last few years. Some projects are uniquely using MongoDB, others are using Cassandra. A few using both. We all can agree that Cassandra and MongoDB became part of our life. So let’s start discussing the different between the two.

So if you now start considering Cassandra or MongoDB as a data store for your future project which one would it be?

What we can agree on is that each has its strengths, its own weakness, and though possible, those are not changeable in a matter of seconds of changing a few lines of code.

I’m going to go thru a few points to demonstrate the differences between the two, in the hope it helps someone.

1. Data model

MongoDB supports document base model that supports dynamic properties that are tied to that document. So you don’t need to pre-declare the structure of the document. It’s very object oriented and easy to understand. It support joins, briefly, via links. You can also index by inner properties.

As for Cassandra, it goes more strictly. You need to have your table predefined. Should a new need for a field arose, you need to alter the table. That process is not that expensive as it’s in a traditional DB, yet a point to consider.

2. Indexes

Both are offering indexing natively. So you can run your queries efficiently.

It becomes trickier if we discuss secondary indexing.

Secondary indexing seems to be more part of MongoDB. It makes life easy to index any property of a document and look by. So what we got, queries based on properties are allowed to use freely.

Cassandra also is offering secondary indexes. Yet most of the time you won’t need that.

As most of the time you’re going to end up with a table as per a need (or query) your primary key will do the trick. That’s because Cassandra is very cheap in writing.

Cassandra is offering a combination of a Partition Key and Clustering Key. Where Partition Key is responsible for data distribution across all your nodes. And the Clustering Key is responsible for data sorting within the partition (the node itself).

3. High Availability

This where Cassandra starts to sine.

MongoDB supports only a kind of a classical mode of Master-Slave.

Where writes can only be done thru the Master. And reads via the slaves.

So should the Master go down, a new one needs to be elected. That takes time in which you can’t write into MongoDB. That was a bit improved on V3.2, yet you’re going to face a short downtime.

Cassandra on the other hands, doesn’t holds masters, you can write or read to any one of the nodes. So if one is down, you pick up the next one. Practically, you gain 100% of uptime for your ring.

4. Scalability

If all goes well, your project will expend and the initial number of server won’t hold anymore and you need to scale you storage.

This is the second point where Cassandra takes it by far. All you need is to add more node and the ring will balance itself automatically. Your writes will be become speedy again and basically, it will get back to a point near were you started.

As for MongoDB, sadly it’s not that simple. Writing will become a bottleneck over time. You can shard MongoDB, and that might help, yet it’s far from being as a smooth solution as Cassandra.

5. Query language support

As of version 2, Cassandra supports the CQL which is very similar to SQL in syntax. That makes life easy for most of us to learn.

Natively, MongoDB doesn’t support for a query language. Queries are structured as JSONs. Yet you can use a Connector that supports ANSI SQL, though.

Personally, I found CQL very useful to use, but I can see the point of querying by JSONs.

It’s a good time to mention that Cassandra doesn’t support joins. Yet, most probably you won’t need one as discussed above.

Conclusion

If you’re up to scale and high availability, Cassandra is you guy by far. If you’re after more object oriented model in you storage, you probably choose MongoDB. So in short, Cassandra is more built for the hard work and MongoDB is more for tactical solution.

Hagay Onn (the Spot)

InnovatiOnn ■ AI Lectures, Art, Consulting & Development ■ SW Architecture, Design, Implementation & Optimizations (Cloud, Data Pipelines, Automations) ■ Former C++ & Java RT developer. Current: Python & JS dev.

7 年

Great article for people choosing their data store Eran! As for me, i like MongoDB a bit more, as it's easier NOT to pre-declare structure when making POC/MVPs for experiment products/features ;-) Loved reading, Hag

Daniil Pevni

VP R&D | CTO | Chief Architect | Software Executive

7 年

You kind of skipped another important aspect... the cap theorem. This tow DBs have very different approach to it, for example in Cassandra you can actually decide on the level of consistency for a specific query (newer versions of Cassandra) for example you can determine that for a specific query ALL nodes have to agree that the data is correct... this kind of sets the new Cassandra apart from many other dbs that have a predefined CAP properties...

2 次回应

Aleksandr Savin

LLVM

7 年

Second one, because much easy support.

查看更多评论

要查看或添加评论，请登录

Eran Shaham的更多文章

Microservices Chatbot and Coronavirus

2020年6月8日

Microservices Chatbot and Coronavirus

A few weeks ago I shared a short post about a new initiative of mine to have a fun bot to make life much easier in…
Docker image build vs. jib

2020年2月20日

Docker image build vs. jib

Jib is an open-source Java containerizer originally coming from Google. Jib allows to build Docker images from Java…
A JSON schema validator

2019年7月1日

A JSON schema validator

A simple JSON schema validator for the Vert.x world.

2 条评论
vertx-lucene-classification

2019年3月31日

vertx-lucene-classification

Lucene is here for a long time, ML was added to Lucene for a few releases now, yet some aspects were left out. ML can…
Kafka vs. Pulsar

2019年3月28日

Kafka vs. Pulsar

Kafka is here for a long time. Perhaps too long.

1 条评论
UMLet- an open source UML tool

2018年11月28日

UMLet- an open source UML tool

Some aspects of my day job work are drawing many diagrams. That's part of an architect role to create design documents…

2 条评论
Revive- a Single Page Application framework

2018年11月18日

Revive- a Single Page Application framework

I'm uploading a short presentation about a new open sourced Revive which I've made public. Revive is a new light open…
A few words on Docker and Kubernetes

2018年5月30日

A few words on Docker and Kubernetes

We all know Docker Engine; it’s a container runtime. We can run “docker run” on a host whether it’s a server or a VM…

2 条评论
A poor man Dependency Injection

2018年3月18日

A poor man Dependency Injection

Dependency Injection (DI) has been around for a while now. A typical use case would be, for instance, the same piece of…
Apache Storm and big data

2017年8月8日

Apache Storm and big data

A background: Big data is here for a while now. At the practical level, big data helps us to better understand our…

See all articles

Cassandra VS. MongoDB

Eran Shaham

Architect and a Big-Data leader at Ford Research Center Israel LTD

1. Data model

2. Indexes

3. High Availability

4. Scalability

5. Query language support

Eran Shaham的更多文章

社区洞察

其他会员也浏览了

How to do Indexing in MongoDB?

5 Benefits of using MongoDB

Cassandra and Microsoft

How to Choose Between PostgreSQL, MongoDB, and Amazon DocumentDB

MongoDB Series - Part 4 - Indexing Basics and Types

MongoDB Indexing - How to Index Arrays, Nested Data and GeoLocations?

Apache Cassandra Database

Use MongoDB Aggregations to improve query performance

Understanding MongoDB Aggregation: A Simple Guide ??

MongoDB indexing tutorial with examples

1. Data model

2. Indexes

3. High Availability

4. Scalability

5. Query language support

Eran Shaham的更多文章

Microservices Chatbot and Coronavirus

Docker image build vs. jib

A JSON schema validator

vertx-lucene-classification

Kafka vs. Pulsar

UMLet- an open source UML tool

Revive- a Single Page Application framework

A few words on Docker and Kubernetes

A poor man Dependency Injection

Apache Storm and big data

社区洞察

其他会员也浏览了

How to do Indexing in MongoDB?

5 Benefits of using MongoDB

Cassandra and Microsoft

How to Choose Between PostgreSQL, MongoDB, and Amazon DocumentDB

MongoDB Series - Part 4 - Indexing Basics and Types

MongoDB Indexing - How to Index Arrays, Nested Data and GeoLocations?

Apache Cassandra Database

Use MongoDB Aggregations to improve query performance

Understanding MongoDB Aggregation: A Simple Guide ??

MongoDB indexing tutorial with examples