Indexes In Database: Introduction to M-way Trees
Awais Kamran
Engineering Lead @ Conrad Labs | AWS Certified Solution Architect | Full Stack Developer | Machine Learning Enthusiast | Tech Speaker
Introduction
Database management systems play a crucial role in efficiently storing, retrieving, and managing vast amounts of data. One of the key components in optimizing database performance is the use of indexes. Indexes provide a structured way to access data quickly, reducing the time it takes to execute database queries. In this article, we will explore how indexes are represented in a database using M-way trees, a common data structure for indexing.
The Significance Of Indexes
Imagine a library with thousands of books. Finding a specific book can be a daunting task if there's no catalog or index. Database indexes serve a similar purpose. They act as a roadmap to quickly locate specific data within a database table. Without indexes, the database system would need to scan the entire table for every query, resulting in slow and inefficient data retrieval.
Visual Representation Of Indexes
Let's assume you have a basic employee table stored in a database table as the diagram below represents. The following diagram also shows how a particular record is saved on the physical disk, which is a combination of sectors, tracks, and their intersection known as blocks where actual data is stored. The blocks can be of specific storage size depending on your underlying hardware.
Supposedly, three records can be stored within each block of a disk, given that the database has 1,500 records, how many blocks of the disk would be visited to search for a particular entry?
The answer is:
Number of records / Number of entries in each block
1500 / 3 ~ 500 blocks
When we argue about the performance of retrieving records from a database, it is closely tied to the number of blocks being visited. We can't reduce the number of blocks on the disk since numeric and characters have a set storage that they consume in the form of row data and spanned storage on more blocks are our primary issue towards performance.
This is the moment when your manager tells you to create indexes on a particular column.
Let's see what happens under the hood. Indexes are just another table storing the key of a particular table and a pointer of the record to its actual location on the physical disk. They themselves are stored within the disk where all of your records reside, YES! They consume storage as well, they aren't a magical tool that just reduces the time of retrieval. The following diagram depicts this relationship.
Since we have less data within a row of an index table, it takes less storage which means one block on a disk can now accommodate more records within an index table.
Less disk movement.
Fewer blocks to be visited.
Supposedly, six records can now be saved on one block, given that we have 1,500 records from the above example, how many blocks can be used to accommodate the data?
领英推荐
The answer is:
Number of records / Number of entries in each block
1500 / 6 ~ 250 blocks
Just by adding one table of indexes that knows the physical location of each record on the disk, we were able to reduce our disk visits by half, this is the performance gain that we get by introducing indexes on a particular column.
We can go on a roll with indexes, there's a lot we can do like multi-level indexing (indexes over indexes) and multi-column indexes (indexes on more than one column in the same table), but the question is how are they implemented?
Enter M-way Trees!
If we just flip the structure of the example above, it does shape up to be a weird tree? doesn't it? ??
Enter M-way Trees!
Remember our old friend BST (binary search tree)? It can be denoted as a 2-way tree and a subset of an M-way Tree. The M-way tree can have an array of values on each node instead of a single numeric node. Let's consider the following example of a 3-Way tree to gain a bit of a grasp on the concept.
There are two key takeaways for M-way trees, firstly they would have (M-1) keys on each node that's why you see only two numbers packed in one row in the example of the 3-way tree above. Secondly, they follow the exact same rules of insertion similar to a BST which can be seen in the example above.
Limitation of M-way Trees
If we keep on adding an increasing value, what would the M-way tree above shape up to be? Yes! a linear list. The problem with M-way trees is that they don't follow a set of rules for the insertion which would keep the growth of the tree in check. This is where B & B+ trees come in which are actually used for the implementation of indexes in most of the database systems.
Let's keep that for the next article ??