登录查看更多内容

How to use an index in MongoDB?

Tushar Goel

Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead

发布日期: 2020年8月15日

In this post, I will explain how to use an index in MongoDB. In an earlier article, I explained how indexes work. Mostly it is similar for MongoDB as well.

As we know now, the advantages of using Indexes are:

Improves query performance as index data is stored in sorted order
Without the index, whole collection will be scanned (COLLSCAN)
If the index is present only index scan performs (IXSCAN). Only one type of scan is possible in MongoDB

Let's define a collection 'Birthday' to understand this concept better. A particular document typically looks like.

use test;

db.Birthday.insertOne({
    name: "tushar0"
    year: 1958-09-26T00:00:00.000+00:00
});

Hereby year, I mean date. Now if we check the existing indexes

> db.Birthday.getIndexes();
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.Birthday"
        }
]
>

Where v is index version, key is default index order (1 is ascending and -1 is descending order), name is index name, ns is a namespace(<databasename>.<collectionname>);

Let's generate some big random data (100000 documents) for our testing. I have used mongoose and Node.js to generate data. Also, I have used MongoDB docker image.

const mongoose = require("mongoose");
mongoose.connect("mongodb://192.168.99.100:27017/test", {
  useUnifiedTopology: true,
  useNewUrlParser: true,
});
const db = mongoose.connection;


const randomInt = (max, min) => {
  return Math.floor(Math.random() * (max - min + 1)) + min;
};


db.once("open", () => {
  const birthdaySchema = new mongoose.Schema(
    {
      name: String,
      year: Date,
    },
    { collection: "Birthday" }
  );


  const birthdays = mongoose.model("birthdays", birthdaySchema);


  let inputs = [];
  for (let index = 0; index < 100000; index++) {
    inputs.unshift({
        name: "tushar" + index,
        //random birth year between 1950 to 2020
        year: new Date(randomInt(50 * 365, -20 * 365) * 24 * 60 * 60 * 1000),
      });
  }

  
  birthdays.insertMany(inputs,
    (err, data) => {
      if (err) console.log(err);
      if (data) console.log(data);
    }
  );

});


db.close();

You can see we have generated 100000 documents.

Let's search for some value from the collection. We have not used any index here yet.

I tried to find the name 'tushar6753' from the 100000 records.

Let's examined the performance of it using explain('executionStats'). You can also use the overloaded method explain() without params. But earlier gives stats in more detail so I used that one only.

If you see the stat, it scans (COLLSCAN) whole collections. It examined 100000 records and took 220 msec.

Now let's do the same with the index. But first, understand how indexes are stored?

As the index is stored in sorted order, so it will store in order (ascending or descending) based on the indexed key defined. In the previous article, we have learned that with the key (internal representation to store Tree key) indexed key and data pointer is stored. So whenever, any indexed key is selected it uses the data pointer to fetch the data.

Few important facts about the index:

It uses Btree index data
Default Index: {_id: 1} in each collection stores in ascending order and is unique
The name of this index is _id_. Can’t delete it created by default

Enough about the representation. Now, I am going to create an index on the name on our collection 'Birthday'.

In this, you see I used the name as an index key and '1' as an order. It tells us the order(Ascending or Descending) in which data will be stored.

Let's check the existing indexes, we found 2 indexes are present. One is default and the other is one we have created. "name_1" is the name of the index. Generally, it is combination of a variable name and "_". Numeric value depends upon how many fields are there in an index.

Now, let's try to analyze the time taken to search the same value. This time instead of a collection scan it does index scan (IXSCAN), Wow, it took only 3msec and examined only 1 document. From 220msec to 3msec. Fantastic improvement in the performance. Don't you think?

We have seen that with the index, we get amazing performance. Let's try again to search the value using another field 'year' and observe the performance then.

Oops, it again examined whole collection and performance drops to 194msec.

What happened to the index? We already have used index so why this behavior? Haven't we?

Because we have used an index on the 'name' field and we are searching using 'year'. It will do the whole collection scan.

Also, To solve this we can use a composite index that means more than 1 field or another index. In the case of the composite index, remember that indexes works Left to right. In next article, I will explain more about this and query performances.

The last thing, I want to mention here is that you can also delete the index.

Now, you see we have dropped index on the field name (need to specify order as well). After dropping the index, when we checked again the existing indexes, we see only the default index is present now.

That's it for this article. I hope it explains how can index be used in MongoDB. Please drop a message if you have any questions.

rajni kant

Securing Information, Applications and Services, Zero Trust

4 年

Nice article ??

1 次回应

Prashant S

K8+Golang Developer

4 年

I read similar usage in elastic search..

1 次回应

查看更多评论

要查看或添加评论，请登录

Tushar Goel的更多文章

DB Isolation Levels Part -2 (Repeatable Read and Serializable)

2022年7月11日

DB Isolation Levels Part -2 (Repeatable Read and Serializable)

In the last article, we learned about Read un-committed and Read committed isolation levels and the problems linked…
DB Isolation Levels in Detail - Part -1

2022年6月27日

DB Isolation Levels in Detail - Part -1

If two transactions don't touch the same data, they can safely run in parallel because they are not dependent on each…

1 条评论
How Kafka is so efficient?

2022年2月20日

How Kafka is so efficient?

What is Apache Kafka? Apache Kafka is a distributed streaming platform that allows you to:: Publish and subscribe to…
Dependency Inversion

2022年1月23日

Dependency Inversion

Dependency ingestion is ‘D’ in the SOLID design principle. It is a guideline that helps design loosely coupled classes.
Law of Demeter (Principle of least knowledge)

2022年1月15日

Law of Demeter (Principle of least knowledge)

In the last article we have discussed Tell, Don’t ask guideline. In this article, we will discuss another guideline…
Tell, Don't ask!

2022年1月8日

Tell, Don't ask!

Few days back I was having a discussion with one of my friends and we were discussing how to reduce coupling between…
Concurrency Pattern: Active object pattern

2020年10月11日

Concurrency Pattern: Active object pattern

This pattern is a type of concurrency design pattern and widely used in applications. Also, it is used to create…
JavaScript: DOM Manipulation

2020年9月6日

JavaScript: DOM Manipulation

In this article, I will explain how to do DOM manipulation using pure JavaScript. We have many other libraries…
The JavaScript EventLoop

2020年8月29日

The JavaScript EventLoop

This post will explain one of the most important concepts of JavaScript i.e.

See all articles

How to use an index in MongoDB?

Tushar Goel

Lead Member Of Technical Staff @ Salesforce | Staff Software Engineer, Technical Lead

Tushar Goel的更多文章

社区洞察

其他会员也浏览了

SuperMap iServer Temporary Resource Storage Selection and Configuration

MongoDB

Starting with YugabyteDB or MongoDB?

Exploring MongoDB Aggregation: A Practical Guide with Examples

Understanding Elasticsearch

MONGO DB

Cisco using MongoDB

SQLite Vs MongoDB

10 Reasons to Learn MongoDB for 2019

Here's what you missed on MongoDB

Tushar Goel的更多文章

DB Isolation Levels Part -2 (Repeatable Read and Serializable)

DB Isolation Levels in Detail - Part -1

How Kafka is so efficient?

Dependency Inversion

Law of Demeter (Principle of least knowledge)

Tell, Don't ask!

Concurrency Pattern: Active object pattern

JavaScript: DOM Manipulation

The JavaScript EventLoop

社区洞察

其他会员也浏览了

SuperMap iServer Temporary Resource Storage Selection and Configuration

MongoDB

Starting with YugabyteDB or MongoDB?

Exploring MongoDB Aggregation: A Practical Guide with Examples

Understanding Elasticsearch

MONGO DB

Cisco using MongoDB

SQLite Vs MongoDB

10 Reasons to Learn MongoDB for 2019

Here's what you missed on MongoDB