登录查看更多内容

Bloom Filters

Kartikey Srivastava

Software Engineer | Backend | Distributed Systems

发布日期: 2024年10月12日

Imagine walking into a mobile shop that only sells Samsung and OnePlus phones. You ask the shopkeeper for an iphone. Without wasting any time, he says, “No, we don’t sell iphones.” Now, you change your mind to buy a One Plus model(a rare one). Instead of denying straightaway, the shopkeeper responds, “It might be in stock, but I can’t be 100% sure until I look.”

This is similar to how a Bloom Filter works, it can definitely tell you if something doesn’t exist(like an iphone in an android shop), but if it says something might exist, there is a certain uncertainty of the data’s presence.

To keep it more simple, Bloom Filter can give you a false positive(i.e. return true for something that is false). But it can never give you a false negative(i.e. return false for something that is true).

We all have encountered those warning messages while signing up on some platform that says, “Username already taken/exists”. That’s Bloom Filters in picture(not always as there are some more methods like querying the database or cache but that becomes inefficient when the user traffic explodes).

Let’s understand this Data Structure bit by bit:

A Bloom Filter is a probabilistic data structure. Now what does that mean?

Probabilistic in context of data structures means that this data structure provides answers with a certain possibility of being correct. This filter can definitely say when an element is not present inside it, but it might be wrong in some cases when it says, “Yes, I have this data”.

A bloom filter uses a bit array to store the data. Now when I say it stores the data, it doesn’t store the actual raw data but applies some hash functions to those data sets and accordingly marks the corresponding positions in the bit array. For example, if you want to check if “Kartik” exists inside a bloom filter it will apply multiple hash functions to this input and then mark their corresponding indexes as 1.

Let’s break this down:

Input - ‘Kartik’ to be added inside a bloom filter.
Hash functions applied: Since a bloom filter uses multiple hash functions, we will consider it has 3 for better understanding.
Positions to be marked: Since this filter uses a bit array(a relatively large size bit arrray) where all indexes are set to 0(switch off) by default, based on the hash function’s result the positions will be marked as 1(switch on).
That’s it. The data is stored now. Go ahead to check if it is present or not.

Below is a visual representation of hash functions being applied to an input:

Now, the outputs of the above hash functions(3, 5, 9) are the positions that need to be set. Below is its visual representation:

领英推荐

Smartphones: What’s Next?

Shelly Palmer 6 年前

How iPhone's Subtle Innovation Keeps Us Hooked

EMB Global 5 个月前

Success Factor of Smartphones Industries

Muhammad Ahsan, BBHRM-LUMS, MS-HRM 1 年前

As you can see the positions 3,5 and 9 are set to 1.

Checking if the data is present in the filter or not:

You give the input. Let’s say the input is again “Kartik”.
The same hash functions are applied to “Kartik.”
On applying the hash functions we would get the same result for obvious reasons which would fetch us the same positions inside the bit array i.e. 3,5 and 9.
The filter checks if these positions in the bit array are set to 1.
Oh, these indexes are already marked as 1(they are on). This means the data might be present.(Username already exists, please try using a different username).

However, if any of the hash function results point to a position that is still 0, the filter can confidently say, "No, the data is not here." This is because the same input, when hashed, will always give the same result, so if those positions aren't set, the data was never added.

A Bloom filter can give false positives (saying something might exist when it doesn't), but it can never give false negatives (saying something doesn’t exist when it actually does). So, when you check for “Kartik” and all bits are 1, it means “Kartik” might exist. If even one bit is 0, the filter is sure it doesn't exist.

To minimize the number of false positives you carefully need to decide the length of the bit array to be used(within your system’s capabilities) and also the number of hash functions.

Increasing the number of hash functions exponentially would result in slowing down the process but decreasing them won’t increase the speed but would result in an increased number of false positives.

Read more about how to calculate the above specifications here

That’s all that I know about Bloom Filters as of now. I have used it in my personal projects but not on big scale ones so the choice of using this data structure should be entirely yours. I’m attaching a link where you can experiment on this data structure and find out how many times you get a false positive?

Here you go:

https://llimllib.github.io/bloomfilter-tutorial/

If you read it till here, I’d like you to please share your story in case you used this data structure in your experience. If not used, do consider giving it a try and if you liked this post, please like, comment and share to increase the reach. You can also follow me for more such content. I post on weekends. Thanks :)

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

要查看或添加评论，请登录

Kartikey Srivastava的更多文章

Event Sourcing - A Design Pattern

2024年11月17日

Event Sourcing - A Design Pattern

Event sourcing is a design pattern where state changes are logged as a sequence of immutable events, rather than…

5 条评论
Setting Up Apache Kafka for My Real-Time Analytics Platform

2024年11月10日

Setting Up Apache Kafka for My Real-Time Analytics Platform

After facing multiple issues I've finally setup Apache Kafka . So far : Set up Kafka on my local machine.

1 条评论
RPC - Remote Procedure Call

2024年10月26日

RPC - Remote Procedure Call

Imagine you are working in a microservice architecture where there are 2 services: A and B. Service B can operate only…
Chaos Monkey: Unleashing Chaos for Resilient Systems

2024年10月19日

Chaos Monkey: Unleashing Chaos for Resilient Systems

Picture this: You’re lounging on your couch, popcorn in hand, mid-way through your favourite Netflix series when boom…

2 条评论
Encryption —> Symmetric

2024年10月13日

Encryption —> Symmetric

In simple words, converting a readable information(plain text) into something unreadable(also known as cipher text) to…
Caching - An overview

2024年10月6日

Caching - An overview

In simple terms, caching means storing the frequently accessed data in a storage where one can quickly retrieve it…
How to choose the right database for your system?

2024年10月5日

How to choose the right database for your system?

Being a software developer isn’t just about making a system and getting it running. Imagine you’ve developed an…

4 条评论
Optimizing an API Response

2024年9月29日

Optimizing an API Response

As backend engineers, we frequently deal with API latency issues. It's easy to create an endpoint and invoke a method…

See all articles

Bloom Filters

Kartikey Srivastava

Software Engineer | Backend | Distributed Systems

领英推荐

Kartikey Srivastava的更多文章

社区洞察

其他会员也浏览了

Buy 1 Get 1 Free iPhone 14: The Ultimate Guide to an Amazing Deal

Stop believing in these tricks from smartphone companies while buying a new device

The Apple Phenomenon-Why Everyone is Buying iPhones Now?

Smartphone Sales Rebounds Q4 2023: China Sees Positive Signs

Does the world need another mid-level smartphone manufacturer?

Which Android Phone is the Best Competitor of the iPhone?

Apple could end the game. Its just a matter of time!

5 things the iPhone X can do That no Other Phone Can

The real reasons behind Apple’s launch of iPhone SE

Galaxy, iPhones, and Pixels: how tech companies are advertising phones

领英推荐

Kartikey Srivastava的更多文章

Event Sourcing - A Design Pattern

Setting Up Apache Kafka for My Real-Time Analytics Platform

RPC - Remote Procedure Call

Chaos Monkey: Unleashing Chaos for Resilient Systems

Encryption —> Symmetric

Caching - An overview

How to choose the right database for your system?

Optimizing an API Response

社区洞察

其他会员也浏览了

Buy 1 Get 1 Free iPhone 14: The Ultimate Guide to an Amazing Deal

Stop believing in these tricks from smartphone companies while buying a new device

The Apple Phenomenon-Why Everyone is Buying iPhones Now?

Smartphone Sales Rebounds Q4 2023: China Sees Positive Signs

Does the world need another mid-level smartphone manufacturer?

Which Android Phone is the Best Competitor of the iPhone?

Apple could end the game. Its just a matter of time!

5 things the iPhone X can do That no Other Phone Can

The real reasons behind Apple’s launch of iPhone SE

Galaxy, iPhones, and Pixels: how tech companies are advertising phones