How to use an index in MongoDB?
Tushar Goel
Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead
In this post, I will explain how to use an index in MongoDB. In an earlier article, I explained how indexes work. Mostly it is similar for MongoDB as well.
As we know now, the advantages of using Indexes are:
- Improves query performance as index data is stored in sorted order
- Without the index, whole collection will be scanned (COLLSCAN)
- If the index is present only index scan performs (IXSCAN). Only one type of scan is possible in MongoDB
Let's define a collection 'Birthday' to understand this concept better. A particular document typically looks like.
use test; db.Birthday.insertOne({ name: "tushar0" year: 1958-09-26T00:00:00.000+00:00 });
Hereby year, I mean date. Now if we check the existing indexes
> db.Birthday.getIndexes(); [ { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "test.Birthday" } ] >
Where v is index version, key is default index order (1 is ascending and -1 is descending order), name is index name, ns is a namespace(<databasename>.<collectionname>);
Let's generate some big random data (100000 documents) for our testing. I have used mongoose and Node.js to generate data. Also, I have used MongoDB docker image.
const mongoose = require("mongoose"); mongoose.connect("mongodb://192.168.99.100:27017/test", { useUnifiedTopology: true, useNewUrlParser: true, }); const db = mongoose.connection; const randomInt = (max, min) => { return Math.floor(Math.random() * (max - min + 1)) + min; }; db.once("open", () => { const birthdaySchema = new mongoose.Schema( { name: String, year: Date, }, { collection: "Birthday" } ); const birthdays = mongoose.model("birthdays", birthdaySchema); let inputs = []; for (let index = 0; index < 100000; index++) { inputs.unshift({ name: "tushar" + index, //random birth year between 1950 to 2020 year: new Date(randomInt(50 * 365, -20 * 365) * 24 * 60 * 60 * 1000), }); } birthdays.insertMany(inputs, (err, data) => { if (err) console.log(err); if (data) console.log(data); } ); }); db.close();
You can see we have generated 100000 documents.
Let's search for some value from the collection. We have not used any index here yet.
I tried to find the name 'tushar6753' from the 100000 records.
Let's examined the performance of it using explain('executionStats'). You can also use the overloaded method explain() without params. But earlier gives stats in more detail so I used that one only.
If you see the stat, it scans (COLLSCAN) whole collections. It examined 100000 records and took 220 msec.
Now let's do the same with the index. But first, understand how indexes are stored?
As the index is stored in sorted order, so it will store in order (ascending or descending) based on the indexed key defined. In the previous article, we have learned that with the key (internal representation to store Tree key) indexed key and data pointer is stored. So whenever, any indexed key is selected it uses the data pointer to fetch the data.
Few important facts about the index:
- It uses Btree index data
- Default Index: {_id: 1} in each collection stores in ascending order and is unique
- The name of this index is _id_. Can’t delete it created by default
Enough about the representation. Now, I am going to create an index on the name on our collection 'Birthday'.
In this, you see I used the name as an index key and '1' as an order. It tells us the order(Ascending or Descending) in which data will be stored.
Let's check the existing indexes, we found 2 indexes are present. One is default and the other is one we have created. "name_1" is the name of the index. Generally, it is combination of a variable name and "_". Numeric value depends upon how many fields are there in an index.
Now, let's try to analyze the time taken to search the same value. This time instead of a collection scan it does index scan (IXSCAN), Wow, it took only 3msec and examined only 1 document. From 220msec to 3msec. Fantastic improvement in the performance. Don't you think?
We have seen that with the index, we get amazing performance. Let's try again to search the value using another field 'year' and observe the performance then.
Oops, it again examined whole collection and performance drops to 194msec.
What happened to the index? We already have used index so why this behavior? Haven't we?
Because we have used an index on the 'name' field and we are searching using 'year'. It will do the whole collection scan.
Also, To solve this we can use a composite index that means more than 1 field or another index. In the case of the composite index, remember that indexes works Left to right. In next article, I will explain more about this and query performances.
The last thing, I want to mention here is that you can also delete the index.
Now, you see we have dropped index on the field name (need to specify order as well). After dropping the index, when we checked again the existing indexes, we see only the default index is present now.
That's it for this article. I hope it explains how can index be used in MongoDB. Please drop a message if you have any questions.
Securing Information, Applications and Services, Zero Trust
4 年Nice article ??
K8+Golang Developer
4 年I read similar usage in elastic search..