Cassandra VS. MongoDB

Cassandra and MongoDB became to be the two of the most popular NOSQL databases that are running around in the last few years. Some projects are uniquely using MongoDB, others are using Cassandra. A few using both. We all can agree that Cassandra and MongoDB became part of our life. So let’s start discussing the different between the two.

So if you now start considering Cassandra or MongoDB as a data store for your future project which one would it be?

What we can agree on is that each has its strengths, its own weakness, and though possible, those are not changeable in a matter of seconds of changing a few lines of code.

I’m going to go thru a few points to demonstrate the differences between the two, in the hope it helps someone.

1.     Data model

MongoDB supports document base model that supports dynamic properties that are tied to that document. So you don’t need to pre-declare the structure of the document. It’s very object oriented and easy to understand. It support joins, briefly, via links. You can also index by inner properties.

As for Cassandra, it goes more strictly. You need to have your table predefined. Should a new need for a field arose, you need to alter the table. That process is not that expensive as it’s in a traditional DB, yet a point to consider.

 2.     Indexes

Both are offering indexing natively. So you can run your queries efficiently.

It becomes trickier if we discuss secondary indexing.

Secondary indexing seems to be more part of MongoDB. It makes life easy to index any property of a document and look by. So what we got, queries based on properties are allowed to use freely.

Cassandra also is offering secondary indexes. Yet most of the time you won’t need that.

As most of the time you’re going to end up with a table as per a need (or query) your primary key will do the trick. That’s because Cassandra is very cheap in writing.

Cassandra is offering a combination of a Partition Key and Clustering Key. Where Partition Key is responsible for data distribution across all your nodes. And the Clustering Key is responsible for data sorting within the partition (the node itself).

 3.     High Availability

This where Cassandra starts to sine.

MongoDB supports only a kind of a classical mode of Master-Slave.

Where writes can only be done thru the Master. And reads via the slaves.

So should the Master go down, a new one needs to be elected. That takes time in which you can’t write into MongoDB. That was a bit improved on V3.2, yet you’re going to face a short downtime.

Cassandra on the other hands, doesn’t holds masters, you can write or read to any one of the nodes. So if one is down, you pick up the next one. Practically, you gain 100% of uptime for your ring.

 4.     Scalability

If all goes well, your project will expend and the initial number of server won’t hold anymore and you need to scale you storage.

This is the second point where Cassandra takes it by far. All you need is to add more node and the ring will balance itself automatically. Your writes will be become speedy again and basically, it will get back to a point near were you started.

As for MongoDB, sadly it’s not that simple. Writing will become a bottleneck over time. You can shard MongoDB, and that might help, yet it’s far from being as a smooth solution as Cassandra.

5.     Query language support

As of version 2, Cassandra supports the CQL which is very similar to SQL in syntax. That makes life easy for most of us to learn.

Natively, MongoDB doesn’t support for a query language. Queries are structured as JSONs. Yet you can use a Connector that supports ANSI SQL, though.

Personally, I found CQL very useful to use, but I can see the point of querying by JSONs.

It’s a good time to mention that Cassandra doesn’t support joins. Yet, most probably you won’t need one as discussed above.

 Conclusion

If you’re up to scale and high availability, Cassandra is you guy by far. If you’re after more object oriented model in you storage, you probably choose MongoDB. So in short, Cassandra is more built for the hard work and MongoDB is more for tactical solution.

Hagay Onn (the Spot)

InnovatiOnn ■ AI Lectures, Art, Consulting & Development ■ SW Architecture, Design, Implementation & Optimizations (Cloud, Data Pipelines, Automations) ■ Former C++ & Java RT developer. Current: Python & JS dev.

7 年

Great article for people choosing their data store Eran! As for me, i like MongoDB a bit more, as it's easier NOT to pre-declare structure when making POC/MVPs for experiment products/features ;-) Loved reading, Hag

回复
Daniil Pevni

VP R&D | CTO | Chief Architect | Software Executive

7 年

You kind of skipped another important aspect... the cap theorem. This tow DBs have very different approach to it, for example in Cassandra you can actually decide on the level of consistency for a specific query (newer versions of Cassandra) for example you can determine that for a specific query ALL nodes have to agree that the data is correct... this kind of sets the new Cassandra apart from many other dbs that have a predefined CAP properties...

Second one, because much easy support.

回复

要查看或添加评论,请登录

Eran Shaham的更多文章

  • Microservices Chatbot and Coronavirus

    Microservices Chatbot and Coronavirus

    A few weeks ago I shared a short post about a new initiative of mine to have a fun bot to make life much easier in…

  • Docker image build vs. jib

    Docker image build vs. jib

    Jib is an open-source Java containerizer originally coming from Google. Jib allows to build Docker images from Java…

  • A JSON schema validator

    A JSON schema validator

    A simple JSON schema validator for the Vert.x world.

    2 条评论
  • vertx-lucene-classification

    vertx-lucene-classification

    Lucene is here for a long time, ML was added to Lucene for a few releases now, yet some aspects were left out. ML can…

  • Kafka vs. Pulsar

    Kafka vs. Pulsar

    Kafka is here for a long time. Perhaps too long.

    1 条评论
  • UMLet- an open source UML tool

    UMLet- an open source UML tool

    Some aspects of my day job work are drawing many diagrams. That's part of an architect role to create design documents…

    2 条评论
  • Revive- a Single Page Application framework

    Revive- a Single Page Application framework

    I'm uploading a short presentation about a new open sourced Revive which I've made public. Revive is a new light open…

  • A few words on Docker and Kubernetes

    A few words on Docker and Kubernetes

    We all know Docker Engine; it’s a container runtime. We can run “docker run” on a host whether it’s a server or a VM…

    2 条评论
  • A poor man Dependency Injection

    A poor man Dependency Injection

    Dependency Injection (DI) has been around for a while now. A typical use case would be, for instance, the same piece of…

  • Apache Storm and big data

    Apache Storm and big data

    A background: Big data is here for a while now. At the practical level, big data helps us to better understand our…

社区洞察

其他会员也浏览了