Database Indexes vs Full Text Search Index - When, Why and How?
Shrey Batra
CEO @ Cosmocloud | Ex-LinkedIn | Angel Investor | MongoDB Champion | Book Author | Patent Holder (Distributed Algorithms)
It's been some time since I have last written an article and I wanted to share a summary of my latest LinkedIn Live session for people who missed it.
We often talk about Full Text Search indexes as if it was very similar to Database indexes and we can somehow achieve the same thing with Database indexes. Well, let's see what are the two things, how it is different and when should you use them.
What we'll learn in this article -
If you love these small 5 min reads, please drop a ?? on the article / post.
Database Indexes - A recap
We often create Database indexes to speed up our read operations. We often head that we can create indexes on columns / fields that are part of our queries, so that our queries don't scan all the records in our table / collection, and can find the records much more quickly.
These above database indexes often use the exact value of the column / field that we index, does not work on case insensitiveness and are often used with ESR rules --> Equality, Range and Sort operations.
Then how are Full Text Search indexes different?
Full Text Search indexes
Full Text Search or FTS is short, refers to creating a reverse index or inverted index on top of your data. Generally, this means that we are going to pre-process our data before indexing to have a much more text language specific indexing.
Remember how Database Text Indexes above used TRIE data structure? Well, FTS indexes do not use this, but creates an inverted index. This happens because FTS indexes do not store the actual value in the index, but computes one or multiple pre-processed values which it stores in the index.
Let's see with an example.
Let's say we have two strings -
While building an FTS index, the FTS index first breaks down the content of the field (the text strings in our case) into multiple indexable tokens -
Now as we see above, the content is tokenised and this bag of words now points to which record does this token comes from.
领英推荐
While doing this, an FTS index can also do more things like stemming (removing tense), synonyms, stop word removals (such as the, a, an, etc.), semantics as well as case insensitive and other textual operations.
Querying in FTS indexes
Now as we saw above, if we perform a query on these FTS index records, we can query for dog and it will give back both the records, while if we search for sun it will just give us the second record.
FTS Index vs Regex Queries in Database
You would now have a question in mind -- Why not do a Regex query in the database which does exactly the same thing as Full Text Search index? Well, does it do the same thing? Let's see below -
Performance Issues
Performing a Regex query (such as "where name like %shre%" ) can create a lot of performance issues. As regex are unstructured and fuzzy searched, a database engine does not know how to effectively use an index for these searches.
Therefore, a Regex query will perform a Collection Scan or Table Scan which means it is going to read each record in the table / collection for each query -- Imagine running 100,000 queries on a table with 10,000,000 records ??
Stemming
While Regex would do a text match operation on the database, an FTS index can stem each word (or tokens) before matching. For example, the words run, running, ran all will be stemmed to run before indexing, removing the tense of the word.
Semantic Search
An FTS engine can perform semantic search which means taking the context of the word into consideration before searching. For example - If you want to search for football, it might lead you to return the records containing soccer as well, as both the terms are semantically close to each other.
Conclusion
As we saw above, a Full Text Search engine is very different than Database Indexes, both in the working as well as it's requirement in building applications.
In next articles, we will also see where all we can use FTS indexes (the use cases), how to effectively use and build an FTS index (we'll learn about Elasticsearch and MongoDB Atlas Search) as well as we'll see how you can do the same on Cosmocloud in just 5 minutes (with practical)! ????
Cosmocloud is the Backend-as-a-Service platform I am building, improving the developer's productivity by 8x, while still giving you full customisation and control on building complicated APIs and other Backend components. Check it out here.
Stay tuned, subscribe to this newsletter here, and if you liked the above content, please don't forget to Like ?? and Repost ?? with your Linkedin community ??
Senior Product Security Engineer at LinkedIn | Supply Chain Security | Security Automation | Technical Content Writer
9 个月amazing great!
Front end Diveloper (React)
9 个月Thanks for sharing
Software Engineer
9 个月Insightful read!
Transforming Ideas into Scalable Tech Solutions | Python & SaaS Expert | Freelance Partner for Startups
9 个月Amazing, morning coffee with FTS, ????
Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022
9 个月An amazing post curated on Database Indexes vs Full text search! Shrey Batra