Introduction to Probabilistic Data Structures

Introduction to Probabilistic Data Structures

I have often written about the SDE Skills meetup. I may be biased, but I think it is one of the best tech interview prep groups around. The secret to its success is the amazing set of volunteers and the engaging participants.

Yesterday, we had our first in-person System Design Meetup that we have had in a while. This is the first of many more to come.

Here is the handout Nagarajulu Aerakoni shared: https://jmpto.us/23-10-28_Handout

Here is a quick run down of what happened (based on the notes I took):

Data structures - Pt 1: Bloom Filters and more...

Hash functions are used to map an infinite input space to a finite output space. Hash functions are usually one-way functions and suffer from collisions, where multiple inputs may map to the same output.

Bloom filters are used to answer the question of membership. Does this element exist in the bloom filter? The response is a Definite No, or a Maybe Yes. Due to the nature of how it is structured, it cannot provide a deterministic affirmative answer. It is fast, efficient and this comes at the cost of false positives, where it may provide a “it may exist” even if it doesn’t exist in the filter.

Count-Min Sketch is used to count the number of appearances of an item. There are quite a bit of similarities in the design and application of bloom filters and count-min-sketch. This can provide an answer to how many times have we seen this element. The response may be overstated by a bit.

Quick sidebar on applying these during tech interviews

In a tech interview, how you talk about solutions is super important. While these cool algorithms, Bloom filters and Count-Min Sketch, can help save space when dealing with lots of data they do suffer from incorrect responses. That said:

Don't start with these fancy algorithms: Bloom filters or Count-Min Sketch are rarely a part of any system design interview question’s initial design. Keep your solutions simple and stay away from these “fancy” algorithms. These algorithms are like special tools.

Bring them up at the right time. Usually, when we are discussing the storage, time-complexity of your design, if it is appropriate, you may want to bring up these algorithms to explain how you can significantly improve the design by leveraging these. This is very similar to how you think about adding a load balancer, or a caching layer. Applying them prematurely will make you sound very text-bookish.

And lastly, don’t bring these up if you can’t explain these algorithms well.

Data structures part 2: Counters

On to Linear counters, and LogLog style counters.

Often times, you will need to estimate the cardinality (number of unique elements).

  • Linear Counters uses a hash function to estimate the cardinality, suffers from overcounting due to collision.
  • A variety of LogLog style counters that estimates the cardinality based on the number of trailing zeros in the hash values of the elements in the set.

What Next?

We had 20+ participants, the room had capacity for a few more. If you have some time to spare join us in upcoming weeks. Engage with us in many channels that we have.


#systemdesign #algorithms #meetup #sdeskills


要查看或添加评论,请登录

Vivekanand Kirubanandan的更多文章

社区洞察

其他会员也浏览了