登录查看更多内容

Bad Medicine

Daniele Graziani

Senior Consulting Engineer at MongoDB

发布日期: 2017年4月24日

+ 关注

All MongoDB databases suffer from a genetic disease: they can scale to ginormous.

What is the cure for scale-to-ginormous-itis? Optimization.

The symptoms of scale-to-ginormous-itis have a way of creeping into your application. At the beginning, when you create your application with just some sample data, everything runs just fine. All queries return in 0ms and you have nothing to worry about.

Fast-forward to when your application is in production, handling a significant amount of data, and the symptoms begin to show up. What are the symptoms? Slow queries.

You already know that the medicine for slow queries is indexes, but this medicine has a serious side effect: slower insert queries (and maybe write in general). Wait, what???

Are indexes BAD MEDICINE then? Well, sometimes. Indexes that are created without carefully studying utilization patterns can turn into more of a detriment than a booster to performance.

Design Good Medicine

How to write an index and how to make sure that it optimizes your queries is a great subject for a future post. For now, I will assume that you already know how to do that. In the examples below I will make sure to create an index that includes only the one key used for filtering.

The key to optimizing query performance is to study the utilization pattern of all queries and to use the explain() method to measure the performance of each. That's it! But it takes a little time and you have to find the time to do it.

First, gather a list of all queries. Measure the performance of each query by using .explain("executionStats"). For example, if you have a query that is db.collection.find({"a": 1}) you want to use db.collection.explain("executionStats").find({"a": 1}) and log the results. Create a script that does this and you will be able to quickly measure performance before and after creating your indexes.

Second, run MongoStat and add 1,000,000 documents (create a script to do that). If you have any insert queries make sure that the documents added here are very similar to the documents added in your application. Starting with no indexes and hardly any data mongostat shows a value between 850 and 1100/second for inserts. Let's say that we are adding 1000 documents per second and that each takes 1ms.

Third, measure performance again using the script developed in step one. What was running in 0ms now takes time. Do not only look at executionTimeInMillis, also look at nReturned, keysExamined, and docsExamined. Actually, also look at whether your queries do collection scans and in-memory sorts.

In the images below the 5 values are for: success | nReturned | ms | keysExamined | docsExamined.

The query to return all documents without any filter does a collection scan: it returns 1,000,020 documents after inspecting all 1,000,020 documents. It does so in 315ms and inspects no index keys (obviously since we don't have any usable indexes).

The query to find all documents (db.collection.find({}).sort({"a": 1}).explain("executionStats")) and sort them, fails due to in-memory sort. The limit for in-memory sorts is 32Mb, which was reached after loading 20,701 documents. This is not good. We definitely need to create an index to fix it.

The third query is a find query that aims to find 10,000 documents that match a condition (db.collection.find({"a": 1}).explain("executionStats")). To do so, without an index, it has to examine all 1,000,020 documents and the query takes 933ms.

The fourth query is an update query filtered using the same match condition (db.collection.explain("executionStats").update({"a": 1}, {"b": 2})). It takes the same amount of time and effort.

The fifth query is a remove query using the same match condition (db.collection.explain("executionStats").remove({"a": 1}, {"justOne": true})). It also takes about the same amount of time and effort scanning all documents and clocking at 1033ms

Administer and Evaluate Your Medicine

Fourth, given that my queries seem to all use a certain key to filter on I will add an index on that key (db.collection.createIndex({"a": 1})). Also, to support a proposed search feature I will add a text index on all the string fields to support a generic search feature (db.collection.createIndex({"$**": "text"})). The first index took about 3.5 seconds to create. The second one took six minutes in the foreground. That's a long time to lock up the application, so plan heavy indexes accordingly and consider adding heavy indexes in the background.

Fifth, let's evaluate the medicine. Measure performance again. All queries now run in 0ms with the exception of the third one that took 9ms and the second one that had to inspect a million index keys instead of a million documents and still took 1 second. That swap (between documents examined and index keys examined) is still slow, almost as slow as doing a collection scan, but the in-memory sort was avoided and the query actually completed.

Also, the text search performed well. At first, I searched a word that was in all one-million documents and it took four seconds. Not that great, but it had to inspect a million keys. To make the test more meaningful I put a new word ("kaputt") in just 10,000 documents and those 10,000 documents returned in 28ms (db.collection.find({$text: {$search: "kaputt"}})).

What is also interesting to note is that the update and delete write queries improved their performance thanks to the index. This is because they found the document to write instantly and then only had to write one document. A query that has to update every document in the database (i.e. it uses no match filter) would actually run more slowly because it cannot benefit from the index to find the documents to update.

We still have to look at the performance impact on the insert queries. First of all, the baseline: we had measured about 1000 inserts per second--the equivalent of performing an insert in 1ms.

First we need to delete the million records added. I used .remove and it took a while. It would have been faster to just drop the collection. After that, I called the add a million documents script and monitored the mongostat output. This time it was only adding 650 documents per second. Here is a snippet of the mongostat output:

Not that bad, but significantly slower than before. If before it took 1ms to add a document now it takes 1.5ms. Still, these are among the lowest numbers we've seen. As we stand most reads and writes return in 0ms, inserts return in 1.5 ms, and a massive text search query returns in 28ms. That's with a million documents.

Conclusion

You may want to ask: what about a billion documents? My first reaction is that it would take an estimated 17 days on my underpowered laptop to insert that many 1.5 Kb documents. I will keep that investigation for another blog post. I did not use bulk insert this time: it would be interesting to measure the speed difference.

The process described in this post is a bit tedious, but the scripts are going to come in handy in the future too, so I think that optimizing performance by simulating a future state of the database and by using explain() and mongostat is well worth doing. Some people argue that indexes are something to worry about only when you have begun scaling. This is probably a matter of opinion, but I disagree: once you have defined your queries and application you can start observing your application's usage patterns and you can take action to prevent problems. An ounce of prevention is worth it, and if it turns out to be the wrong prevention you can fix it later: but you don't want to start completely from scratch later. IMHO, the sooner you start thinking about performance the better. To measure performance you need data. In this post, I have showed you how to get that data by using the explain method and mongostat.

This is my methodology. What is yours? I am curious to hear about other ways to optimize queries using indexes.

I hope that this post has helped someone who is trying to figure out how to optimize performance with indexes. I look forward to your questions!

Hitesh Kumar

7 年

good stuff..thanks for sharing. you can also use sharding ...thats also help in scaling by dividing the load on each shard. we use mongo db with layer of elastic search on top of it and we dont index on mongodb. if scaling is the issue you will need to use something similar to elastic search. indexing is good but it also slows your inserts.

1 次回应

Daniele Graziani

Senior Consulting Engineer at MongoDB

7 年

Come to the MongoDB user group meeting on 4/26 to hear more details about indexes. I will also talk a bit about query optimization: https://www.meetup.com/Silicon-Valley-MongoDB-User-Group/events/238931212/

1 次回应

查看更多评论

要查看或添加评论，请登录

Daniele Graziani的更多文章

Just Switch to Stitch

2019年12月14日

Just Switch to Stitch

I label myself: "the laziest programmer in the world" because I don't put up with unnecessary efforts. As you can read…
Evaluating MDB as an Investment?

2018年1月18日

Evaluating MDB as an Investment?

MongoDB (MDB) recently went public and reported earnings for the first time. Despite a few analysts trying to explain…

4 条评论
To Stitch, Or Not to Stitch? That Is Not Even A Question!

2017年11月6日

To Stitch, Or Not to Stitch? That Is Not Even A Question!

Whether 'tis nobler in the mind of an engineer To endure the slings and arrows of backend code, Or to take Arms against…
Rome Wasn't Built in One Day--So Isn't Stitch

2017年11月2日

Rome Wasn't Built in One Day--So Isn't Stitch

Campbell, CA - November 2, 2017 I recently set out to write a feature for my web site www.marcopoloni.
Challenges of A MongoDB Migration

2017年4月7日

Challenges of A MongoDB Migration

Did I make it sound as if migrating from SQL Server to MongoDB was a piece of cake? Sorry to have misled you! The…

3 条评论
My First Steps in Adopting MongoDB

2017年3月7日

My First Steps in Adopting MongoDB

In my previous post I described my life-long association with databases and how in 2015 I made the leap from SQL Server…
How I Moved from SQL Server to MongoDB

2017年2月22日

How I Moved from SQL Server to MongoDB

I developed my first database in the year 1995 (just before Windows 95 was released). I used a database that I am…

13 条评论

See all articles

Bad Medicine

Daniele Graziani

Senior Consulting Engineer at MongoDB

Design Good Medicine

Administer and Evaluate Your Medicine

Conclusion

Daniele Graziani的更多文章

社区洞察

其他会员也浏览了

Genetic Genie and Xcode Life: What You Should Know Before Purchasing An MTHFR Report

Genetic Genie and Xcode Life: What You Should Know Before Purchasing An MTHFR Report

How Accurate Are Home DNA Tests

Why do genetic tests that supposedly offer the same thing vary so much in price?

Important Things To Know About Your Consumer Genetic Reports

Saving Lives with genetic Data

#18 The Genome India Project: A Step Toward Self-Sufficiency in Genomic Research

Exploring the Human Microbiome through Sequencing & Clustering

I Got My Whole Genome Sequenced. Here’s What I Learned.

Design Good Medicine

Administer and Evaluate Your Medicine

Conclusion

Daniele Graziani的更多文章

Just Switch to Stitch

Evaluating MDB as an Investment?

To Stitch, Or Not to Stitch? That Is Not Even A Question!

Rome Wasn't Built in One Day--So Isn't Stitch

Challenges of A MongoDB Migration

My First Steps in Adopting MongoDB

How I Moved from SQL Server to MongoDB

社区洞察

其他会员也浏览了

Genetic Genie and Xcode Life: What You Should Know Before Purchasing An MTHFR Report

Genetic Genie and Xcode Life: What You Should Know Before Purchasing An MTHFR Report

How Accurate Are Home DNA Tests

Why do genetic tests that supposedly offer the same thing vary so much in price?

Important Things To Know About Your Consumer Genetic Reports

Saving Lives with genetic Data

#18 The Genome India Project: A Step Toward Self-Sufficiency in Genomic Research

Exploring the Human Microbiome through Sequencing & Clustering

I Got My Whole Genome Sequenced. Here’s What I Learned.