How to use an index in MongoDB?

How to use an index in MongoDB?

In this post, I will explain how to use an index in MongoDB. In an earlier article, I explained how indexes work. Mostly it is similar for MongoDB as well.

As we know now, the advantages of using Indexes are:

  • Improves query performance as index data is stored in sorted order
  • Without the index, whole collection will be scanned (COLLSCAN)
  • If the index is present only index scan performs (IXSCAN). Only one type of scan is possible in MongoDB

Let's define a collection 'Birthday' to understand this concept better. A particular document typically looks like.

use test;

db.Birthday.insertOne({
    name: "tushar0"
    year: 1958-09-26T00:00:00.000+00:00
});

Hereby year, I mean date. Now if we check the existing indexes

> db.Birthday.getIndexes();
[
        {
                "v" : 2,
                "key" : {
                        "_id" : 1
                },
                "name" : "_id_",
                "ns" : "test.Birthday"
        }
]
>

Where v is index version, key is default index order (1 is ascending and -1 is descending order), name is index name, ns is a namespace(<databasename>.<collectionname>);

Let's generate some big random data (100000 documents) for our testing. I have used mongoose and Node.js to generate data. Also, I have used MongoDB docker image.

const mongoose = require("mongoose");
mongoose.connect("mongodb://192.168.99.100:27017/test", {
  useUnifiedTopology: true,
  useNewUrlParser: true,
});
const db = mongoose.connection;


const randomInt = (max, min) => {
  return Math.floor(Math.random() * (max - min + 1)) + min;
};


db.once("open", () => {
  const birthdaySchema = new mongoose.Schema(
    {
      name: String,
      year: Date,
    },
    { collection: "Birthday" }
  );


  const birthdays = mongoose.model("birthdays", birthdaySchema);


  let inputs = [];
  for (let index = 0; index < 100000; index++) {
    inputs.unshift({
        name: "tushar" + index,
        //random birth year between 1950 to 2020
        year: new Date(randomInt(50 * 365, -20 * 365) * 24 * 60 * 60 * 1000),
      });
  }

  
  birthdays.insertMany(inputs,
    (err, data) => {
      if (err) console.log(err);
      if (data) console.log(data);
    }
  );

});


db.close();

You can see we have generated 100000 documents.

No alt text provided for this image

Let's search for some value from the collection. We have not used any index here yet.

No alt text provided for this image

I tried to find the name 'tushar6753' from the 100000 records.

Let's examined the performance of it using explain('executionStats'). You can also use the overloaded method explain() without params. But earlier gives stats in more detail so I used that one only.

If you see the stat, it scans (COLLSCAN) whole collections. It examined 100000 records and took 220 msec.

Now let's do the same with the index. But first, understand how indexes are stored?

No alt text provided for this image

As the index is stored in sorted order, so it will store in order (ascending or descending) based on the indexed key defined. In the previous article, we have learned that with the key (internal representation to store Tree key) indexed key and data pointer is stored. So whenever, any indexed key is selected it uses the data pointer to fetch the data.

Few important facts about the index:

  • It uses Btree index data
  • Default Index: {_id: 1} in each collection stores in ascending order and is unique
  • The name of this index is _id_. Can’t delete it created by default

Enough about the representation. Now, I am going to create an index on the name on our collection 'Birthday'.

No alt text provided for this image

In this, you see I used the name as an index key and '1' as an order. It tells us the order(Ascending or Descending) in which data will be stored.

Let's check the existing indexes, we found 2 indexes are present. One is default and the other is one we have created. "name_1" is the name of the index. Generally, it is combination of a variable name and "_". Numeric value depends upon how many fields are there in an index.

No alt text provided for this image

Now, let's try to analyze the time taken to search the same value. This time instead of a collection scan it does index scan (IXSCAN), Wow, it took only 3msec and examined only 1 document. From 220msec to 3msec. Fantastic improvement in the performance. Don't you think?

We have seen that with the index, we get amazing performance. Let's try again to search the value using another field 'year' and observe the performance then.

No alt text provided for this image

Oops, it again examined whole collection and performance drops to 194msec.

What happened to the index? We already have used index so why this behavior? Haven't we?

Because we have used an index on the 'name' field and we are searching using 'year'. It will do the whole collection scan.

Also, To solve this we can use a composite index that means more than 1 field or another index. In the case of the composite index, remember that indexes works Left to right. In next article, I will explain more about this and query performances.

No alt text provided for this image

The last thing, I want to mention here is that you can also delete the index.

Now, you see we have dropped index on the field name (need to specify order as well). After dropping the index, when we checked again the existing indexes, we see only the default index is present now.

That's it for this article. I hope it explains how can index be used in MongoDB. Please drop a message if you have any questions.

rajni kant

Securing Information, Applications and Services, Zero Trust

4 年

Nice article ??

Prashant S

K8+Golang Developer

4 年

I read similar usage in elastic search..

要查看或添加评论,请登录

Tushar Goel的更多文章

  • DB Isolation Levels Part -2 (Repeatable Read and Serializable)

    DB Isolation Levels Part -2 (Repeatable Read and Serializable)

    In the last article, we learned about Read un-committed and Read committed isolation levels and the problems linked…

  • DB Isolation Levels in Detail - Part -1

    DB Isolation Levels in Detail - Part -1

    If two transactions don't touch the same data, they can safely run in parallel because they are not dependent on each…

    1 条评论
  • How Kafka is so efficient?

    How Kafka is so efficient?

    What is Apache Kafka? Apache Kafka is a distributed streaming platform that allows you to:: Publish and subscribe to…

  • Dependency Inversion

    Dependency Inversion

    Dependency ingestion is ‘D’ in the SOLID design principle. It is a guideline that helps design loosely coupled classes.

  • Law of Demeter (Principle of least knowledge)

    Law of Demeter (Principle of least knowledge)

    In the last article we have discussed Tell, Don’t ask guideline. In this article, we will discuss another guideline…

  • Tell, Don't ask!

    Tell, Don't ask!

    Few days back I was having a discussion with one of my friends and we were discussing how to reduce coupling between…

  • Concurrency Pattern: Active object pattern

    Concurrency Pattern: Active object pattern

    This pattern is a type of concurrency design pattern and widely used in applications. Also, it is used to create…

  • JavaScript: DOM Manipulation

    JavaScript: DOM Manipulation

    In this article, I will explain how to do DOM manipulation using pure JavaScript. We have many other libraries…

  • The JavaScript EventLoop

    The JavaScript EventLoop

    This post will explain one of the most important concepts of JavaScript i.e.

社区洞察

其他会员也浏览了