Introduction to Probabilistic Data Structures
Vivekanand Kirubanandan
Founder @ SDE Skills - Building a community to upskill software engineers | Investor | Engineering Leader
I have often written about the SDE Skills meetup. I may be biased, but I think it is one of the best tech interview prep groups around. The secret to its success is the amazing set of volunteers and the engaging participants.
Yesterday, we had our first in-person System Design Meetup that we have had in a while. This is the first of many more to come.
Here is the handout Nagarajulu Aerakoni shared: https://jmpto.us/23-10-28_Handout
Here is a quick run down of what happened (based on the notes I took):
Data structures - Pt 1: Bloom Filters and more...
Hash functions are used to map an infinite input space to a finite output space. Hash functions are usually one-way functions and suffer from collisions, where multiple inputs may map to the same output.
Bloom filters are used to answer the question of membership. Does this element exist in the bloom filter? The response is a Definite No, or a Maybe Yes. Due to the nature of how it is structured, it cannot provide a deterministic affirmative answer. It is fast, efficient and this comes at the cost of false positives, where it may provide a “it may exist” even if it doesn’t exist in the filter.
Count-Min Sketch is used to count the number of appearances of an item. There are quite a bit of similarities in the design and application of bloom filters and count-min-sketch. This can provide an answer to how many times have we seen this element. The response may be overstated by a bit.
Quick sidebar on applying these during tech interviews
In a tech interview, how you talk about solutions is super important. While these cool algorithms, Bloom filters and Count-Min Sketch, can help save space when dealing with lots of data they do suffer from incorrect responses. That said:
Don't start with these fancy algorithms: Bloom filters or Count-Min Sketch are rarely a part of any system design interview question’s initial design. Keep your solutions simple and stay away from these “fancy” algorithms. These algorithms are like special tools.
Bring them up at the right time. Usually, when we are discussing the storage, time-complexity of your design, if it is appropriate, you may want to bring up these algorithms to explain how you can significantly improve the design by leveraging these. This is very similar to how you think about adding a load balancer, or a caching layer. Applying them prematurely will make you sound very text-bookish.
领英推荐
And lastly, don’t bring these up if you can’t explain these algorithms well.
Data structures part 2: Counters
On to Linear counters, and LogLog style counters.
Often times, you will need to estimate the cardinality (number of unique elements).
What Next?
We had 20+ participants, the room had capacity for a few more. If you have some time to spare join us in upcoming weeks. Engage with us in many channels that we have.
#systemdesign #algorithms #meetup #sdeskills