登录查看更多内容

Advanced Hashing Techniques Unveiled: Cuckoo Hashing [Series 1]

Amit Pal

Engineering Leader@Egnyte | ERN-stack Architect | Empowering Engineers | Sharing Insights Weekly (WebWiz Newsletter)

发布日期: 2024年5月6日

Hashing, a fundamental data structure, underpins countless software applications for its speed, efficiency, and versatility. Despite its perceived simplicity, aspiring programmers should explore the intricacies of hashing techniques. This knowledge is ubiquitous and essential for safeguarding sensitive data, fine-tuning search algorithms, or upholding data integrity, hashing plays a pivotal role in software development.

While many hashing techniques are available, I set my sights on unraveling three advanced techniques through captivating articles. There, I'll untangle the core concepts, drape them with sample code snippets, and explain some real-world applications. Brace yourself for the unveiling of Cuckoo Hashing — the opening act of our hashing saga. And if curiosity beckons, stay tuned for the forthcoming chapters.

If you find it insightful and appreciate my writing, consider following me for updates on future content. I'm committed to sharing my knowledge and contributing to the coding community. Join me in spreading the word and helping others to learn.

Introduction

Cuckoo Hashing is a hash table collision resolution technique that ensures constant-time lookup by using two hash functions. It employs multiple-choice hash tables and guarantees that each key is stored in one of the tables.

Now you might be wondering what is multiple-choice hash tables. Consider it like finding the right box for your toy. If one box is full, you try another until you find an empty one. It's the same with storing data in different boxes until you find a place without any stuff in it. That's multiple-choice hashing!

When a collision occurs, the existing item is relocated to its alternate position in the other table. This process continues recursively until all keys find a place or a predefined threshold is reached, indicating a table expansion.

Cuckoo Hashing offers efficient retrieval and insertion operations with O(1) time complexity, with worst-case constant-time complexity, making it suitable for applications requiring fast key-value lookups while minimizing memory overhead.

How Does it work

I'll try articulating of how Cuckoo Hashing works for adding (inserting) and searching for an item.

Adding (Inserting) an Item:

Calculate Hash Values: When adding a new item, calculate two hash values using two different hash functions. These hash values determine the potential positions where the item can be stored in the hash tables.
Check Availability: Check if the calculated positions in both hash tables are empty. If both positions are empty, insert the item into one of the positions and mark it as occupied.
Collision Handling: If one or both positions are occupied, a collision occurs. In Cuckoo Hashing, it triggers a relocation process.
Relocation: Move the existing item(s) from the occupied position(s) to their alternate positions in the other hash table. Then, insert the new item into the vacated position. If this insertion causes another collision, repeat the relocation process recursively until all items find a place or a predefined threshold is reached.
Threshold Check: If the relocation process exceeds a predefined threshold (e.g., the maximum number of relocations allowed), resize the hash tables and rehash the items to accommodate the new size.

Searching for an Item:

Calculate Hash Values: When searching for an item, calculate its hash values using the same hash functions used for insertion.
Check Positions: Check the calculated positions in both hash tables for the item. If the item is found at either position, return it as the search result.
Handling Absence: If the item is not found at either position, it means the item is not present in the hash tables.
End of Search: End the search operation with the result (found or not found).

I found this funny flow chart that lucidly explains this technique. If you want to know more, read through this document: https://www.brics.dk/RS/01/32/BRICS-RS-01-32.pdf

Sandeep Jain 4 个月前

??Top ML Papers of the Week

DAIR.AI 3 个月前

DSA Mastery: A Beginner's Guide to Classification

Manish V. 11 个月前

Pros and Cons

There are several advantages and disadvantages of this approach. Some primary advantages are as follows:

Constant-Time Lookup: It ensures constant-time lookup, making it efficient for retrieval operations. It is attributed to the higher performance as well.
Minimal Space Overhead: It minimizes memory usage by storing only keys, avoiding additional data structures like linked lists or buckets.
Deterministic Behavior: The placement of keys is deterministic, ensuring predictable performance characteristics.

There are some challenges with this technique as well:

Limited Key Density: Cuckoo Hashing requires a low key density to avoid excessive relocations, limiting its effectiveness in highly dense datasets.
Sensitivity to Hash Functions: The performance of Cuckoo Hashing is sensitive to the quality and independence of hash functions used, requiring careful selection and design.

Real-time Use Cases

Cuckoo Hashing is utilized in various real-time use cases across several domains due to its efficient lookup and insertion operations. Some common real-time use cases of Cuckoo Hashing include:

Network Routing Tables: Cuckoo Hashing is crucial for storing routing information in network routers and switches, enabling fast lookup and efficient packet forwarding.
In-Memory Databases: Cuckoo Hashing is widely used in in-memory databases for indexing and querying data, providing rapid access to stored information.
Cache Management: Cuckoo Hashing is employed in cache management systems to store frequently accessed data for quick retrieval, enhancing application performance.
Content Delivery Networks (CDNs): CDNs utilize Cuckoo Hashing in cache management layers to store cached content and swiftly serve requests to users, improving content delivery speed.
Distributed Systems: Cuckoo Hashing is applied in distributed systems for data partitioning and sharding, ensuring the even distribution of data across multiple nodes and facilitating efficient data access and load balancing.

Code Snippet

Enough of theories; let's dive into some practical code snippets. I've written the following code with inline comments to ensure a clear understanding of its purpose.

class CuckooHashing {
    /**
     * Constructor for the CuckooHashing class.
     */
    constructor(size) {
        this.size = size; // The size of the hash tables

        // Initialize two hash tables with null values
        this.table1 = new Array(size).fill(null);
        this.table2 = new Array(size).fill(null);
    }

    /**
     * Inserts a key-value pair into the hash tables.
     */
    insert(key, value) {
        const hash1 = this.hashFunction1(key);
        const hash2 = this.hashFunction2(key);
        
        if (this.table1[hash1] === null) {
            // Check if the slot in table1 is empty, if so, insert the item
            this.table1[hash1] = { key, value };
        } else if (this.table2[hash2] === null) {
            // If slot in table1 is occupied, check table2 and insert if slot is empty
            this.table2[hash2] = { key, value };
        } else {
            // If both slots are occupied, perform eviction and recursive insertion
            const { key: evictedKey, value: evictedValue } = this.table1[hash1];

            this.table1[hash1] = { key, value };
            this.insert(evictedKey, evictedValue); // Recursive call to handle evicted item
        }
    }

    /**
     * Searches for a key in the hash tables and returns the corresponding value if found.     
     */
    search(key) {
        const hash1 = this.hashFunction1(key);
        const hash2 = this.hashFunction2(key);

        if (this.table1[hash1] !== null && this.table1[hash1].key === key) {
            // Check if key is found in table1
            return this.table1[hash1].value;
        } else if (this.table2[hash2] !== null && this.table2[hash2].key === key) {
            // Check if key is found in table2
            return this.table2[hash2].value;
        } else {
            // Key not found
            return null;
        }
    }

    /**
     * First hash function for calculating the index in table1.
     */
    hashFunction1(key) {
        return hash(key) % this.size;
    }

    /**
     * Second hash function for calculating the index in table2.
     */
    hashFunction2(key) {
        return Math.floor(hash(key) / this.size) % this.size;
    }
}

This snippet is well-documented. However, if you need further clarification or assistance in understanding any part of it, please feel free to leave a comment. I'll be happy to provide additional explanations as needed.

Advanced Hashing Techniques Unveiled: Cuckoo Hashing [Series 1]

Amit Pal

Engineering Leader@Egnyte | ERN-stack Architect | Empowering Engineers | Sharing Insights Weekly (WebWiz Newsletter)

Introduction

How Does it work

Adding (Inserting) an Item:

Searching for an Item:

领英推荐

Pros and Cons

Real-time Use Cases

Code Snippet

WebWiz

755 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

POV on AI-Generated Code

Replit Agents: Cursor Who?

Does your Algorithmic auditor know how to code ?

Rust Refactoring to enhance Modularity

Simple Glossary for Non Technies

Leveraging GenAI: Code Optimization Made Easy

Data Structures and Algorithms

Nature’s Blueprint for APIs

Mastering Data Structures and Algorithms: Essential Wisdom

Introduction

How Does it work

Adding (Inserting) an Item:

Searching for an Item:

领英推荐

Pros and Cons

Real-time Use Cases

Code Snippet

WebWiz

755 位关注者

Concurrency vs. Parallelism in Software Engineering - get rid of your confusion

2024年10月1日

Building Resilience in Applications: A Comprehensive Guide to Retry Logic

2024年9月24日

Sprint Burned-Down Chart: Team's Nightmare and Micro Perfectionism in Agile

2024年9月17日

Does DuplexPair Crack Down WebSockets? A Ninja Technique Unveiled in Node.js 22.6

2024年9月10日

Dead Letter Queue Management in Webhooks

2024年9月3日

Asset Caching with Service Workers Considering Potential Security Vulnerabilities

2024年8月27日

The Infinite Monkey Theorem: A Metaphor for Goal Setting in the Modern Age

2024年8月19日

Cache Poisoning: Threats, Risks, and Prevention Strategies

2024年8月12日

Agile is Becoming Futile, Impact Engineering is Here to Stay

2024年8月5日

The Power of WebAssembly — possibly a silver bullet to server-side rendering (SSR/PSSR) problems

2024年7月29日

社区洞察

其他会员也浏览了

POV on AI-Generated Code

Replit Agents: Cursor Who?

Does your Algorithmic auditor know how to code ?

Rust Refactoring to enhance Modularity

Simple Glossary for Non Technies

Leveraging GenAI: Code Optimization Made Easy

Data Structures and Algorithms

Nature’s Blueprint for APIs

Mastering Data Structures and Algorithms: Essential Wisdom