Cuckoo Index

Cuckoo Index

1. Overview

Cuckoo hashing is used as a solution for hash collisions, and its worst-case lookup time is constant O(1). The reason why is it called “cuckoo” is because the cuckoo bird lays the eggs in another nest. When the chicks hatched, they push the younger eggs out of the nest. Like the given example, cuckoo hashing has a similar structure for preventing collisions along a table. Cuckoo hashing has a total of two hash functions and two hash tables to index the keys. The value is first indexed in the first hash table. However, if the next value collision with the position of the just inserted value, the old value will be indexed in the second hash table and the new value at the first hash table, and so on.

2. Cuckoo Hash Indexing Structure

Cuckoo Hashing is composed of:

  1. Two Hash Functions:

  • The first hash function is going to index the key in the first hash table.
  • The second hash function is used as collateral to index the key in case another key collides with the new coming key.
  • These hash functions work together. By this, I mean that indexing is going to be recursive because values are going to be pushing each other until all values belong to a cell or “bucket”

2. Two Hash Tables:

  • These two tables will be used to index the keys dynamically when collisions occur in the first or second table to solve collisions as shown below.

3. Cuckoo Insertion Deep Sight

Let’s say we have the next input data:

{"Apples", "Bananas", "Oranges", "Grapes", "Avocados", 
"Tomatoes", "Melons", "Tangerines", "Raisins", "Watermelons"}

# Total of 10 values
        

We will have to hash every word of the set and get the next results table:

If we go step by step, we first insert “Apples” into Table 1.

“Bananas” are also inserted in Table 1 because the slot is empty.

“Oranges” and “Apples” have the same hash index number. Therefore, “Oranges” as the new value is inserted in Table 1 index 9, and “Apples” are relocated to Table 2 index 1.


“Grapes” and “Oranges” have the same hash index number. Therefore, “Grapes” as the new value is inserted in Table 1 Index 9, and “Oranges” are relocated to Table 2 index 4.

“Avocados” are inserted in Table 1 index 1.

“Tomatoes” are inserted in Table 1 index 1 but “Avocados” are relocated to Table 2 index 9

In this step, there are a lot of relocations going on. “Melons” are inserted first in Table 1 index 6. However, “Bananas” were located in that position. Therefore, we send “Bananas” to Table 2 index 4 but “Oranges” are located in that position of the table. This is chained into another relocation because “Oranges” now have to be sent to Table 1 index 9 where “Grapes” are located. Now, “Grapes” have to be sent to Table 2 index 6. Fortunately, there is no Key located in that slot so the process finishes here.

“Tangerines” is inserted in Table 3 index 3 where the slot is empty.

“Raisins” is inserted in Table 1 index 3 and “Tangerines” are relocated in Table 2 index 0.

Finally, “Watermelons” are inserted in Table 1 index 6. However, as we can see in the table below, “Watermelons” is also pushed away from its original position because it collided with the “Bananas” after a long relocation procedure.

3.1 Final Results

We get as a result the next Cuckoo Table according to its hash function values:

5. Conclusion

Cuckoo hashing indeed has an outstanding method to prevent collisions. However, the more loaded the table gets, the more high insertion latency will suffer. Cuckoo indexing can cause endless loops when inserting data because it keeps reallocating the values along the two tables as shown in the above example. This is the main reason why cuckoo hashing has a degraded performance for inserting data in some situations.

Understanding Cuckoo hashing is important because as the hash index, Cuckoo hashing is applied also to probabilistic data structures such as Cuckoo Filters. This probabilistic data structure provides a way to support deletions thanks to the mentioned structure above but also provides equal or less space complexity than the famous bloom filters supported by hash indexing.

References

  1. https://www.geeksforgeeks.org/cuckoo-hashing/
  2. https://en.wikipedia.org/wiki/Cuckoo_hashing
  3. https://programming.guide/cuckoo-hashing.html
  4. https://www.kindsonthegenius.com/cuckoo-hashing/


#hashing #datascience #bigdata #researchanddevelopment #databases #datastructures #datastructuresandalgorithms

要查看或添加评论,请登录

Humberto Villalta的更多文章

  • Network Configuration

    Network Configuration

    Try the following steps to reset the server’s internet connection: Before Starting…. To see which file is connected to…

  • How to Push Local Repo Code into Github Repo

    How to Push Local Repo Code into Github Repo

    This guide is intended to show how to push code from our local computer to a GitHub repository. Step 1 Open the Git…

  • HDFS Clustering Through Docker in CentOS

    HDFS Clustering Through Docker in CentOS

    Overview This guide will show you how to deploy a distributed file system for Hadoop (HDFS). We will first make a…

  • Redis Integration with Python

    Redis Integration with Python

    Overview This guide is made for those who want to install Redis in a Centos 7 docker container and to integrate Python.…

  • What is Redis?

    What is Redis?

    Overview Redis is a C programming written REmote DIctionary Server developed in 2006. Redis read and write operations…

  • Linux [~/.bashrc] File

    Linux [~/.bashrc] File

    This post is intended to show how to interact with the different settings this file offers. Some of the settings that…

  • How to Set a Virtual Machine in Windows OS

    How to Set a Virtual Machine in Windows OS

    Overview Installing a virtual machine on a Windows operating system seems to be the easiest and most basic process that…

    2 条评论
  • Spacetime Data Hub Technology Connecting the Physical and Digital Worlds

    Spacetime Data Hub Technology Connecting the Physical and Digital Worlds

    The advancement of IT technology has enabled people to project physical spaces into virtual spaces known as “digital…

  • Python Decorator Introduction with Examples

    Python Decorator Introduction with Examples

    1. Overview The decorator pattern is a software design pattern that allows us to dynamically add functionality to…

  • HyperLogLog Basics

    HyperLogLog Basics

    Overview Probabilistic data structures are very well-known because of their outstanding time and space complexity among…

社区洞察

其他会员也浏览了