登录查看更多内容

Hash Table Internals - Part 7 - Performant Hash Tables

Arpit Bhayani

发布日期: 2022年8月30日

Hash Tables are designed to give a constant time performance and to do this, it needs to have a large number of slots available. So, which factors decide its performance?

Load Factor

Load Factor is a quantification that makes it simple for us to tell how loaded the Hash Table is, and it is just a simple division of the number of keys and the number of slots in the hash table.

As the load factor increases, the performance of the Hash Table decreases. It happens purely because it takes longer for us to do a slot lookup and find an empty slot to place the key.

The Best Strategy

Every probing strategy or collision resolution strategy has its merit and demerit, and they all perform the best in a certain condition and the worse in others. Let's take a detailed look.

Chained Hashing

Chained Hashing is costly, as it requires us to do a linear traversal of the linked list to find the key we are looking for. As the collisions increase, the lookup time shoots up, degrading the performance.

Chained Hashing is not cache-friendly, as it requires us to do random lookups in the memory while hopping from one linked list node to another.

Double Hashing

Evaluating two hash functions requires extra CPU cycles that could get taxing. Double hashing is also not cache-friendly, as it requires us to jump across the Hash Table to hunt an empty slot.

The optimal strategy is contextual. If the performance of the Hash Table is critical, then we need to experiment, tune, and evaluate the best that fits us.

Lookup Time vs Load Factor

Lookup Time is the most critical metric in evaluating the performance of the Hash Table; when we benchmark Lookup Time vs Load Factor, we would see

perf of Open Addressing degrades as the load factor increases
perf of Chained Hashing degrades gracefully with load factor
Linear Probing would be slower than Double Hashing
Probes required for Double Hashing would be shorter

Arpit Bhayani 2 年前

Tearing Down the Memory Wall

Sharada Yeluri 2 年前

Dedicated CPU Vs Shared vCPUs

Jayashree Baruah 2 年前

Making Chained Hashing cache efficient

Chained Hashing is known for being cache-inefficient, as it requires us to traverse through linked list nodes that may be present across the heap. Can we somehow make it cache efficient?

To make Chained Hashing cache-friendly, we have to ensure that the nodes of the linked list are allocated contiguously instead of randomly. Hence, instead of allocating one node at a time, we allocate the space for 5 nodes (like an array) at a time and then form the linked list out of them.

This would make the linked list leverage the CPU cache well and ensure our iterations are efficient as the next nodes will be available in the CPU cache, not requiring us to fetch them from the main memory.

Here's the video of my explaining this in-depth ?? do check it out

Thank you so much for reading ?? If you found this helpful, do spread the word about it on social media; it would mean the world to me.

If you liked this short essay, you might also like my courses on

I teach an interactive course on System Design where you'll learn how to intuitively design scalable systems. The course will help you

become a better engineer
ace your technical discussions
get you acquainted with a spectrum of topics ranging from Storage Engines, High-throughput systems, to super-clever algorithms behind them.

I have compressed my ~10 years of work experience into this course, and aim to accelerate your engineering growth 100x. To date, the course is trusted by 800+ engineers from 11 different countries and here you can find what they say about the course.

Together, we will dissect and build some amazing systems and understand the intricate details. You can find the week-by-week curriculum and topics, testimonials, and other information at https://arpitbhayani.me/masterclass .

Arpit's Newsletter

107,224 位关注者

Sunny R Gupta

Sr Director Engg @JioCinema | Building at Scale | Cloud native | ex-Atlassian | Thrives in startup mode

2 年

Been reading these for a while now and I must say Arpit, you're doing a great job of explaining the internals at this depth. I've also been consuming your YouTube videos and am learning so much about systems and engineering, small and big! ?? More power to you, keep shining! ?

1 次回应

Arpit Bhayani

2 年

More about me: arpitbhayani.me Newsletter: arpitbhayani.me/newsletter Subscribe #AsliEngineering for such in-depth engineering concepts: https://www.youtube.com/c/ArpitBhayani System Design course: arpitbhayani.me/masterclass Microservices: https://courses.arpitbhayani.me/designing-microservices All GitHub Outages: https://courses.arpitbhayani.me/github-outage-dissections/

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Hash Table Internals - Part 7 - Performant Hash Tables

Arpit Bhayani

Load Factor

The Best Strategy

Chained Hashing

Double Hashing

Lookup Time vs Load Factor

领英推荐

Making Chained Hashing cache efficient

Arpit's Newsletter

107,224 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Dedicated CPU Vs Shared vCPUs

CURT -- The CPU Usage Reporting Tool

Dedicated CPU Vs Shared vCPUs

Down the Rabbit Hole: Optimizing AWS F1 Direct Memory Access (DMA)

Cache

Cache-Aware Memory Allocation Techniques for RTOS

Unveiling the Power of Cache Memory in Boosting Computer Performance

exFAT2-IP: CPU-Free File System with Two-User for NVMe

NVMe Over TCP

‘top’ reporting accurate metrics within containers?

Load Factor

The Best Strategy

Chained Hashing

Double Hashing

Lookup Time vs Load Factor

领英推荐

Making Chained Hashing cache efficient

Arpit's Newsletter

107,224 位关注者

The best resource does not exist.

2024年9月22日

It's not about what you know, but about how you think

2024年9月8日

Roadmaps are just satisfying your urge to follow a syllabus

2024年8月18日

Always negotiate the offer you get

2024年8月11日

Proving your Culture Fit

2024年8月4日

Premature Abstractions

2024年7月28日

Tip the scale in your favor in interviews

2024年7月21日

7 questions that you should ask your interviewer

2024年7月14日

Traits of a 10x engineer

2024年7月7日

How PostgreSQL stores data in files, called forks

2024年6月30日

社区洞察

其他会员也浏览了

Dedicated CPU Vs Shared vCPUs

CURT -- The CPU Usage Reporting Tool

Dedicated CPU Vs Shared vCPUs

Down the Rabbit Hole: Optimizing AWS F1 Direct Memory Access (DMA)

Cache

Cache-Aware Memory Allocation Techniques for RTOS

Unveiling the Power of Cache Memory in Boosting Computer Performance

exFAT2-IP: CPU-Free File System with Two-User for NVMe

NVMe Over TCP

‘top’ reporting accurate metrics within containers?