Cassandra VS. MongoDB
Cassandra and MongoDB became to be the two of the most popular NOSQL databases that are running around in the last few years. Some projects are uniquely using MongoDB, others are using Cassandra. A few using both. We all can agree that Cassandra and MongoDB became part of our life. So let’s start discussing the different between the two.
So if you now start considering Cassandra or MongoDB as a data store for your future project which one would it be?
What we can agree on is that each has its strengths, its own weakness, and though possible, those are not changeable in a matter of seconds of changing a few lines of code.
I’m going to go thru a few points to demonstrate the differences between the two, in the hope it helps someone.
1. Data model
MongoDB supports document base model that supports dynamic properties that are tied to that document. So you don’t need to pre-declare the structure of the document. It’s very object oriented and easy to understand. It support joins, briefly, via links. You can also index by inner properties.
As for Cassandra, it goes more strictly. You need to have your table predefined. Should a new need for a field arose, you need to alter the table. That process is not that expensive as it’s in a traditional DB, yet a point to consider.
2. Indexes
Both are offering indexing natively. So you can run your queries efficiently.
It becomes trickier if we discuss secondary indexing.
Secondary indexing seems to be more part of MongoDB. It makes life easy to index any property of a document and look by. So what we got, queries based on properties are allowed to use freely.
Cassandra also is offering secondary indexes. Yet most of the time you won’t need that.
As most of the time you’re going to end up with a table as per a need (or query) your primary key will do the trick. That’s because Cassandra is very cheap in writing.
Cassandra is offering a combination of a Partition Key and Clustering Key. Where Partition Key is responsible for data distribution across all your nodes. And the Clustering Key is responsible for data sorting within the partition (the node itself).
3. High Availability
This where Cassandra starts to sine.
MongoDB supports only a kind of a classical mode of Master-Slave.
Where writes can only be done thru the Master. And reads via the slaves.
So should the Master go down, a new one needs to be elected. That takes time in which you can’t write into MongoDB. That was a bit improved on V3.2, yet you’re going to face a short downtime.
Cassandra on the other hands, doesn’t holds masters, you can write or read to any one of the nodes. So if one is down, you pick up the next one. Practically, you gain 100% of uptime for your ring.
4. Scalability
If all goes well, your project will expend and the initial number of server won’t hold anymore and you need to scale you storage.
This is the second point where Cassandra takes it by far. All you need is to add more node and the ring will balance itself automatically. Your writes will be become speedy again and basically, it will get back to a point near were you started.
As for MongoDB, sadly it’s not that simple. Writing will become a bottleneck over time. You can shard MongoDB, and that might help, yet it’s far from being as a smooth solution as Cassandra.
5. Query language support
As of version 2, Cassandra supports the CQL which is very similar to SQL in syntax. That makes life easy for most of us to learn.
Natively, MongoDB doesn’t support for a query language. Queries are structured as JSONs. Yet you can use a Connector that supports ANSI SQL, though.
Personally, I found CQL very useful to use, but I can see the point of querying by JSONs.
It’s a good time to mention that Cassandra doesn’t support joins. Yet, most probably you won’t need one as discussed above.
Conclusion
If you’re up to scale and high availability, Cassandra is you guy by far. If you’re after more object oriented model in you storage, you probably choose MongoDB. So in short, Cassandra is more built for the hard work and MongoDB is more for tactical solution.
InnovatiOnn ■ AI Lectures, Art, Consulting & Development ■ SW Architecture, Design, Implementation & Optimizations (Cloud, Data Pipelines, Automations) ■ Former C++ & Java RT developer. Current: Python & JS dev.
7 年Great article for people choosing their data store Eran! As for me, i like MongoDB a bit more, as it's easier NOT to pre-declare structure when making POC/MVPs for experiment products/features ;-) Loved reading, Hag
VP R&D | CTO | Chief Architect | Software Executive
7 年You kind of skipped another important aspect... the cap theorem. This tow DBs have very different approach to it, for example in Cassandra you can actually decide on the level of consistency for a specific query (newer versions of Cassandra) for example you can determine that for a specific query ALL nodes have to agree that the data is correct... this kind of sets the new Cassandra apart from many other dbs that have a predefined CAP properties...
LLVM
7 年Second one, because much easy support.