Database Indexing Essentials in System Design
Momen Negm
Chief Technology Officer @ T-Vencubator | Data Scientist, Generative AI | Tech entrepreneur - Engineering leader
Database Indexing Essentials for System Designers: BTree, Hash, Bitmap, Full text indexing technique deep dive??
?? Introduction: What is Database Indexing
Database indexing is a method to speed up data retrieval within a database. An index works much like a book's table of contents ??—without it, the database would need to scan each row to locate the desired data, which becomes increasingly inefficient as the data volume expands.
Creating an index on a database column establishes a structure that maintains an organized list of pointers to the rows where each unique value appears. This setup allows for much quicker access to rows by specific values, like a particular user ID or product ID, especially as the table size increases.
The Importance of Indexing
Types of Indexing Techniques
There are several indexing methods, each tailored for specific use cases and with its own advantages and limitations. Here, we’ll explore some of the most commonly used types.
1. B-Tree Indexes ??
B-Trees are a balanced tree structure with nodes arranged in sorted order, making them ideal for efficient range-based queries. They are among the most widely used indexing structures in relational databases. In a B-Tree, data is organized hierarchically, with each node capable of holding multiple children. B-Tree indexes are self-balancing, which ensures that the structure remains optimized for both read and write operations, maintaining balanced data organization for high performance.
Advantages:
Disadvantages:
Use Cases: Ideal for large datasets in read-intensive applications, such as product listings on e-commerce platforms. For example, consider an e-commerce platform with a products table containing columns like product_id, price, and date_added. Users may want to filter products by specific price ranges or list those added within a particular timeframe. A B-Tree index on the price or date_added column can facilitate these queries efficiently:
-- Create a B-Tree index on the 'price' column
CREATE INDEX idx_products_price ON products (price);
-- Example query using B-Tree index for a range-based search
SELECT * FROM products
WHERE price BETWEEN 100 AND 500;
The diagram illustrates a B-Tree index structure for a user_id column, showing how index nodes store key values and how leaf nodes link to the actual table pages and rows, depicting the hierarchical organization of the index.
The following diagram illustrates:
2. Hash Indexes ??
Hash indexes leverage a hash function to transform a search key into a specific location within a table, making them particularly effective for equality comparisons (e.g., locating a user by their user ID). However, they are less suited for range queries.
Advantages:
Disadvantages:
Use Cases: High-speed lookups are essential in applications where queries are mainly based on unique IDs or keys, such as session token retrieval. For instance, in a social media platform, user authentication verifies if the provided username and password hash match an existing record. Since this query requires only an exact match without any range-based searching, a hash index is highly suitable.
-- Create a hash index on 'username' for quick exact match lookups
CREATE INDEX idx_users_username ON users USING HASH (username);
-- Example query using hash index for exact match lookup
SELECT * FROM users
WHERE username = 'designnerds';
The following diagram shows how search keys are converted into hash values, Hash bucket structure, and storage, how different keys can map to the same bucket (collision) along with direct mapping between hash values and bucket locations
领英推荐
Now we will look into step-by-step process of hash index lookup along with collision handling within buckets.
3. Bitmap Indexes
Bitmap indexes store columns as binary strings, or bitmaps, where each bit represents the presence or absence of a specific value. They are highly efficient for columns with low cardinality—those with a limited range of distinct values, such as a "status" field.
Advantages:
Disadvantages:
Use Cases: Data warehouses and analytical databases where queries are read-intensive and based on low-cardinality fields. In a data warehouse storing millions of transactions for analysis, columns like status (with values like ‘completed,’ ‘pending,’ ‘failed’) or is_premium (yes/no) benefit from bitmap indexing. Analysts often need to filter and aggregate data based on these low-cardinality columns, and bitmap indexes allow for efficient query processing on them.
-- Create a bitmap index on the 'status' column
-- (Note: Support for bitmap indexes depends on the database system; e.g., Oracle supports it natively)
CREATE BITMAP INDEX idx_transactions_status ON transactions (status);
-- Example query using bitmap index for efficient filtering
SELECT COUNT(*), status
FROM transactions
WHERE status = 'completed'
AND is_premium = 'yes'
GROUP BY status;
The following diagram captures the basic bitmap index structure along with how table rows map to bitmap values, the representation of different status values in separate bitmaps, and efficient storage for low-cardinality data.
Now we will look into Complex query handling using multiple bitmap indexes along with Bitwise AND operations for combining conditions in the below.
Ideal Use Cases: Data warehouses, OLAP systems, Report generation
Poor Use Cases
: OLTP systems, High-cardinality columns, Frequent updates
4. Full-Text Indexes
Full-text indexes are specialized for searching text-based fields using keywords. They are widely used in applications where searching text data is essential, like document management systems.
Advantages:
Disadvantages:
Use Cases: Search-heavy applications, such as social media and document search systems. For eg. Imagine a blog platform where users want to search articles based on keywords, titles, and body content. Full-text indexing on these columns can allow for efficient and flexible search functionality across large text fields.
-- Create a full-text index on the 'title' and 'body' columns for efficient text searching
CREATE FULLTEXT INDEX idx_articles_title_body ON articles (title, body);
-- Example query using full-text index to search for keywords
SELECT * FROM articles
WHERE MATCH(title, body) AGAINST ('database indexing' IN NATURAL LANGUAGE MODE);
The following diagram shows the basic structure of the full-text index, document tokenization process, inverted index mapping along with how words map to document IDs.
Now we will visit full text search operations in which we will see Search query processing along with boolean operations on search terms.
Monitoring and Best Practices for Indexing in Production
Index Monitoring Tools:
Founder & Fractional Chief AI Officer building AI First Engineering Products and Organisations
5 天前Exactly Momen, Database indexing is a powerful tool for optimizing large scale systems.