Cuckoo Index
1. Overview
Cuckoo hashing is used as a solution for hash collisions, and its worst-case lookup time is constant O(1). The reason why is it called “cuckoo” is because the cuckoo bird lays the eggs in another nest. When the chicks hatched, they push the younger eggs out of the nest. Like the given example, cuckoo hashing has a similar structure for preventing collisions along a table. Cuckoo hashing has a total of two hash functions and two hash tables to index the keys. The value is first indexed in the first hash table. However, if the next value collision with the position of the just inserted value, the old value will be indexed in the second hash table and the new value at the first hash table, and so on.
2. Cuckoo Hash Indexing Structure
Cuckoo Hashing is composed of:
2. Two Hash Tables:
3. Cuckoo Insertion Deep Sight
Let’s say we have the next input data:
{"Apples", "Bananas", "Oranges", "Grapes", "Avocados",
"Tomatoes", "Melons", "Tangerines", "Raisins", "Watermelons"}
# Total of 10 values
We will have to hash every word of the set and get the next results table:
If we go step by step, we first insert “Apples” into Table 1.
“Bananas” are also inserted in Table 1 because the slot is empty.
“Oranges” and “Apples” have the same hash index number. Therefore, “Oranges” as the new value is inserted in Table 1 index 9, and “Apples” are relocated to Table 2 index 1.
“Grapes” and “Oranges” have the same hash index number. Therefore, “Grapes” as the new value is inserted in Table 1 Index 9, and “Oranges” are relocated to Table 2 index 4.
领英推荐
“Avocados” are inserted in Table 1 index 1.
“Tomatoes” are inserted in Table 1 index 1 but “Avocados” are relocated to Table 2 index 9
In this step, there are a lot of relocations going on. “Melons” are inserted first in Table 1 index 6. However, “Bananas” were located in that position. Therefore, we send “Bananas” to Table 2 index 4 but “Oranges” are located in that position of the table. This is chained into another relocation because “Oranges” now have to be sent to Table 1 index 9 where “Grapes” are located. Now, “Grapes” have to be sent to Table 2 index 6. Fortunately, there is no Key located in that slot so the process finishes here.
“Tangerines” is inserted in Table 3 index 3 where the slot is empty.
“Raisins” is inserted in Table 1 index 3 and “Tangerines” are relocated in Table 2 index 0.
Finally, “Watermelons” are inserted in Table 1 index 6. However, as we can see in the table below, “Watermelons” is also pushed away from its original position because it collided with the “Bananas” after a long relocation procedure.
3.1 Final Results
We get as a result the next Cuckoo Table according to its hash function values:
5. Conclusion
Cuckoo hashing indeed has an outstanding method to prevent collisions. However, the more loaded the table gets, the more high insertion latency will suffer. Cuckoo indexing can cause endless loops when inserting data because it keeps reallocating the values along the two tables as shown in the above example. This is the main reason why cuckoo hashing has a degraded performance for inserting data in some situations.
Understanding Cuckoo hashing is important because as the hash index, Cuckoo hashing is applied also to probabilistic data structures such as Cuckoo Filters. This probabilistic data structure provides a way to support deletions thanks to the mentioned structure above but also provides equal or less space complexity than the famous bloom filters supported by hash indexing.
References