登录查看更多内容

Cuckoo Index

Humberto Villalta

Python Developer | Data Engineer

发布日期: 2023年12月17日

1. Overview

Cuckoo hashing is used as a solution for hash collisions, and its worst-case lookup time is constant O(1). The reason why is it called “cuckoo” is because the cuckoo bird lays the eggs in another nest. When the chicks hatched, they push the younger eggs out of the nest. Like the given example, cuckoo hashing has a similar structure for preventing collisions along a table. Cuckoo hashing has a total of two hash functions and two hash tables to index the keys. The value is first indexed in the first hash table. However, if the next value collision with the position of the just inserted value, the old value will be indexed in the second hash table and the new value at the first hash table, and so on.

2. Cuckoo Hash Indexing Structure

Cuckoo Hashing is composed of:

Two Hash Functions:

The first hash function is going to index the key in the first hash table.
The second hash function is used as collateral to index the key in case another key collides with the new coming key.
These hash functions work together. By this, I mean that indexing is going to be recursive because values are going to be pushing each other until all values belong to a cell or “bucket”

2. Two Hash Tables:

These two tables will be used to index the keys dynamically when collisions occur in the first or second table to solve collisions as shown below.

3. Cuckoo Insertion Deep Sight

Let’s say we have the next input data:

{"Apples", "Bananas", "Oranges", "Grapes", "Avocados", 
"Tomatoes", "Melons", "Tangerines", "Raisins", "Watermelons"}

# Total of 10 values

We will have to hash every word of the set and get the next results table:

If we go step by step, we first insert “Apples” into Table 1.

“Bananas” are also inserted in Table 1 because the slot is empty.

“Oranges” and “Apples” have the same hash index number. Therefore, “Oranges” as the new value is inserted in Table 1 index 9, and “Apples” are relocated to Table 2 index 1.

“Grapes” and “Oranges” have the same hash index number. Therefore, “Grapes” as the new value is inserted in Table 1 Index 9, and “Oranges” are relocated to Table 2 index 4.

领英推荐

Data Cleaning - Sort Values

Mage 3 年前

Harnessing Data Scrapers : Quest for Missing Persons

Batuhan ?zy?n 4 个月前

Can Likert Scale Data ever be Continuous?

Karen Grace-Martin 3 个月前

“Avocados” are inserted in Table 1 index 1.

“Tomatoes” are inserted in Table 1 index 1 but “Avocados” are relocated to Table 2 index 9

In this step, there are a lot of relocations going on. “Melons” are inserted first in Table 1 index 6. However, “Bananas” were located in that position. Therefore, we send “Bananas” to Table 2 index 4 but “Oranges” are located in that position of the table. This is chained into another relocation because “Oranges” now have to be sent to Table 1 index 9 where “Grapes” are located. Now, “Grapes” have to be sent to Table 2 index 6. Fortunately, there is no Key located in that slot so the process finishes here.

“Tangerines” is inserted in Table 3 index 3 where the slot is empty.

“Raisins” is inserted in Table 1 index 3 and “Tangerines” are relocated in Table 2 index 0.

Finally, “Watermelons” are inserted in Table 1 index 6. However, as we can see in the table below, “Watermelons” is also pushed away from its original position because it collided with the “Bananas” after a long relocation procedure.

3.1 Final Results

We get as a result the next Cuckoo Table according to its hash function values:

5. Conclusion

Cuckoo hashing indeed has an outstanding method to prevent collisions. However, the more loaded the table gets, the more high insertion latency will suffer. Cuckoo indexing can cause endless loops when inserting data because it keeps reallocating the values along the two tables as shown in the above example. This is the main reason why cuckoo hashing has a degraded performance for inserting data in some situations.

Understanding Cuckoo hashing is important because as the hash index, Cuckoo hashing is applied also to probabilistic data structures such as Cuckoo Filters. This probabilistic data structure provides a way to support deletions thanks to the mentioned structure above but also provides equal or less space complexity than the famous bloom filters supported by hash indexing.

References

#hashing #datascience #bigdata #researchanddevelopment #databases #datastructures #datastructuresandalgorithms

要查看或添加评论，请登录

Humberto Villalta的更多文章

Network Configuration

2024年2月13日

Network Configuration

Try the following steps to reset the server’s internet connection: Before Starting…. To see which file is connected to…
How to Push Local Repo Code into Github Repo

2024年2月8日

How to Push Local Repo Code into Github Repo

This guide is intended to show how to push code from our local computer to a GitHub repository. Step 1 Open the Git…
HDFS Clustering Through Docker in CentOS

2024年2月6日

HDFS Clustering Through Docker in CentOS

Overview This guide will show you how to deploy a distributed file system for Hadoop (HDFS). We will first make a…
Redis Integration with Python

2024年1月29日

Redis Integration with Python

Overview This guide is made for those who want to install Redis in a Centos 7 docker container and to integrate Python.…
What is Redis?

2024年1月24日

What is Redis?

Overview Redis is a C programming written REmote DIctionary Server developed in 2006. Redis read and write operations…
Linux [~/.bashrc] File

2024年1月18日

Linux [~/.bashrc] File

This post is intended to show how to interact with the different settings this file offers. Some of the settings that…
How to Set a Virtual Machine in Windows OS

2024年1月15日

How to Set a Virtual Machine in Windows OS

Overview Installing a virtual machine on a Windows operating system seems to be the easiest and most basic process that…

2 条评论
Spacetime Data Hub Technology Connecting the Physical and Digital Worlds

2024年1月10日

Spacetime Data Hub Technology Connecting the Physical and Digital Worlds

The advancement of IT technology has enabled people to project physical spaces into virtual spaces known as “digital…
Python Decorator Introduction with Examples

2024年1月3日

Python Decorator Introduction with Examples

1. Overview The decorator pattern is a software design pattern that allows us to dynamically add functionality to…
HyperLogLog Basics

2023年12月27日

HyperLogLog Basics

Overview Probabilistic data structures are very well-known because of their outstanding time and space complexity among…

See all articles

Cuckoo Index

Humberto Villalta

Python Developer | Data Engineer

1. Overview

2. Cuckoo Hash Indexing Structure

3. Cuckoo Insertion Deep Sight

领英推荐

3.1 Final Results

5. Conclusion

References

Humberto Villalta的更多文章

社区洞察

其他会员也浏览了

A Gentle Introduction to Probabilistic Data Structures

RAG Systems: TOP 3 pros and cons (compared to fine tuning)

Introduction to Group Feature Selection

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Bias-Variance tradeoff

Beyond Fuzzy Matching: The Nuances of Entity Resolution in Today's Data Landscape

Backtransformation

Important Methods in Data Structure Algorithm

RAG Prototype #6 - Adding data sources

Decoding Bloom Filter.??

1. Overview

2. Cuckoo Hash Indexing Structure

3. Cuckoo Insertion Deep Sight

领英推荐

3.1 Final Results

5. Conclusion

References

Humberto Villalta的更多文章

Network Configuration

How to Push Local Repo Code into Github Repo

HDFS Clustering Through Docker in CentOS

Redis Integration with Python

What is Redis?

Linux [~/.bashrc] File

How to Set a Virtual Machine in Windows OS

Spacetime Data Hub Technology Connecting the Physical and Digital Worlds

Python Decorator Introduction with Examples

HyperLogLog Basics

社区洞察

其他会员也浏览了

A Gentle Introduction to Probabilistic Data Structures

RAG Systems: TOP 3 pros and cons (compared to fine tuning)

Introduction to Group Feature Selection

I ran 580 model-dataset experiments to show that, even if you try very hard, it is almost impossible to know that a model is degrading just by looking

Bias-Variance tradeoff

Beyond Fuzzy Matching: The Nuances of Entity Resolution in Today's Data Landscape

Backtransformation

Important Methods in Data Structure Algorithm

RAG Prototype #6 - Adding data sources

Decoding Bloom Filter.??