Coding Challenge #53 - Bloom Filter Spell Checker
John Crickett
Helping you become a better software engineer by building real-world applications.
This challenge is to build your own micro spell checker. The goal is to create a spell checker that can determine if a word is probably spelt correctly without having to store the full list of words. Thus the spell checker can use less storage (disk or memory). A task that is much less relevant these days, but 20 years ago was incredibly useful on low storage devices.
Whilst we don’t have this problem with spell checking today, we do have similar problems checking if an item is in a data set that would be too big to store efficiently.
So how are these problems solved? With Bloom filters.
A bloom filter is a probabilistic data structure. It is built around a bit array and one or more hash functions. It provides a fast and efficient way of handling set membership queries when the set either does not fit within the memory constraints, or querying the full set would incur a performance penalty.
Bloom filters have been used in:
???Spell checkers - like you’re going to build in this Coding Challenge.
???Network routers - speeding up packet routing protocols amongst other uses.
???Databases - a quick way of determining if if a row is likely to be in the database without performing costly disk lookups. Used in Google’s Bigtable, Apache HBase, Apache Cassandra and Postgres
???Web crawlers - don’t re-crawl a URL that’s already been seen.
???Cyber security - blacklisting, whitelisting and password / username checkers
???Web caches - preventing ‘one hit wonders’ being stored in disk caches for CDNs.
???Cryptocurrency - Speeding up wallet synchronisation and finding logs in the blockchain.
???Recommendation systems - Medium has used Bloom filters to not recommend articles you’ve already seen.
If You Enjoy Coding Challenges Here Are Three Ways You Can Help Support It
The Challenge - Building A Simple Spell Checker Using a Bloom Filter
In this challenge you’re going to implement a Bloom filter, and insert thousands of words from a dictionary. You’ll then test other words against the dictionary to see if they’re probably spelt correctly.
Step Zero
In this step you’re going to set your environment up ready to begin developing and testing your solution. I’ll leave you to setup your IDE / editor of choice and programming language of choice for building command line tools.
If you’re not on a Unix like operating system you’ll need to get a dictionary file for testing.
领英推荐
If you are on a Unix-like OS you should be able to generate one by running the command:
% cat /usr/share/dict/words >> dict.txt
Step 1
In this step your goal is to use test driven development (TDD) to develop the Bloom filter data structure. OK you don’t have to use TDD, but I’d suggest writing some tests either before or after writing the Bloom filter, both to test that it works correctly and to help you understand how it works.
To implement a Bloom filter:
To determine the how many hash functions you should use and the interaction between the number of bits, the number of items and the number of hash functions there are a set of formulas which you’ll find documented on the Bloom filter Wikipedia page.
You can use hash functions from your programming languages standard library, implement your own or implement an existing one yourself. For the latter I suggest you check out the Fowler–Noll–Vo hash function and implement a couple of the versions of it. It’s a quick and easy hash function to implement if you’ve never written one before.
Continued...
The steps for the rest of this Coding Challenge are detailed on the website as Build Your Own Spell Checker Using A Bloom Filter.
Share Your Solutions!
If you think your solution is an example of the developers can learn from please share it, put it on GitHub, GitLab or elsewhere. Then let me know via Twitter or LinkedIn or just post about it there and tag me.
Request for Feedback
I’m writing these challenges to help you develop your skills as a software engineer based on how I’ve approached my own personal learning and development. What works for me, might not be the best way for you - so if you have suggestions for how I can make these challenges more useful to you and others, please get in touch and let me know. All feedback greatly appreciated.
Thanks and happy coding!
John
Here's my attempt at bloom filter challenge: https://github.com/anubhavrakshit/coding_challenge_53/tree/main
Full-Stack Developer
1 年What a coincidence. I’ve been building bloomfilter spell checkers a lot in Byron Sommardahl’s dev amplifier program. Then I just signed up for these coding challenges and the first one is… bloom filters. Why does everyone want me to build a bloomin bloom filter! ??
Full Stack Web Developer ? Helping startup companies founder save time & money ? Helping coaches at software agencies.
1 年Building real-world applications is a fantastic way to learn and grow as a dev.