Database Indexes vs Full Text Search Index - When, Why and How?
Database Indexes vs Full Text Search

Database Indexes vs Full Text Search Index - When, Why and How?

It's been some time since I have last written an article and I wanted to share a summary of my latest LinkedIn Live session for people who missed it.

We often talk about Full Text Search indexes as if it was very similar to Database indexes and we can somehow achieve the same thing with Database indexes. Well, let's see what are the two things, how it is different and when should you use them.

What we'll learn in this article -

  • Database Indexes - A recap
  • Full Text Search (FTS) Indexes - How it works
  • FTS Indexes vs Regex Queries

If you love these small 5 min reads, please drop a ?? on the article / post.

Database Indexes - A recap

We often create Database indexes to speed up our read operations. We often head that we can create indexes on columns / fields that are part of our queries, so that our queries don't scan all the records in our table / collection, and can find the records much more quickly.

  • A single field or compound database index uses BTrees (or similar) behind the scenes and adds the values of these fields/columns in the BTree data structure. For example indexing on student_name or roll_num.
  • A text database index is that which uses a TRIE data structure behind the scenes, indexes each strings like a dictionary and which is beneficial when we want to have a Prefix lookup. For example textual autocomplete when searching for students in student_name field. Note - This only works when the query is a prefix query, i.e it starts from the text start and is case sensitive.

These above database indexes often use the exact value of the column / field that we index, does not work on case insensitiveness and are often used with ESR rules --> Equality, Range and Sort operations.

Then how are Full Text Search indexes different?
Searching vs querying

Full Text Search indexes

Full Text Search or FTS is short, refers to creating a reverse index or inverted index on top of your data. Generally, this means that we are going to pre-process our data before indexing to have a much more text language specific indexing.

Remember how Database Text Indexes above used TRIE data structure? Well, FTS indexes do not use this, but creates an inverted index. This happens because FTS indexes do not store the actual value in the index, but computes one or multiple pre-processed values which it stores in the index.

Let's see with an example.

Let's say we have two strings -

  • Record 1 -- The quick brown fox jumped over the lazy dog.
  • Record 2 -- The lazy dog slept in the sun.

While building an FTS index, the FTS index first breaks down the content of the field (the text strings in our case) into multiple indexable tokens -

  • the -- Record 1, Record 2
  • quick -- Record 1
  • brown -- Record 1
  • fox -- Record 1
  • jumped -- Record 1
  • over -- Record 1
  • lazy -- Record 1, Record 2
  • dog -- Record 1, Record 2
  • slept -- Record 2
  • in -- Record 2
  • sun -- Record 2

Now as we see above, the content is tokenised and this bag of words now points to which record does this token comes from.

While doing this, an FTS index can also do more things like stemming (removing tense), synonyms, stop word removals (such as the, a, an, etc.), semantics as well as case insensitive and other textual operations.

Querying in FTS indexes

Now as we saw above, if we perform a query on these FTS index records, we can query for dog and it will give back both the records, while if we search for sun it will just give us the second record.

FTS Index vs Regex Queries in Database

You would now have a question in mind -- Why not do a Regex query in the database which does exactly the same thing as Full Text Search index? Well, does it do the same thing? Let's see below -

Performance Issues

Performing a Regex query (such as "where name like %shre%" ) can create a lot of performance issues. As regex are unstructured and fuzzy searched, a database engine does not know how to effectively use an index for these searches.

Therefore, a Regex query will perform a Collection Scan or Table Scan which means it is going to read each record in the table / collection for each query -- Imagine running 100,000 queries on a table with 10,000,000 records ??

Checking each record

Stemming

While Regex would do a text match operation on the database, an FTS index can stem each word (or tokens) before matching. For example, the words run, running, ran all will be stemmed to run before indexing, removing the tense of the word.

Semantic Search

An FTS engine can perform semantic search which means taking the context of the word into consideration before searching. For example - If you want to search for football, it might lead you to return the records containing soccer as well, as both the terms are semantically close to each other.

Conclusion

As we saw above, a Full Text Search engine is very different than Database Indexes, both in the working as well as it's requirement in building applications.

In next articles, we will also see where all we can use FTS indexes (the use cases), how to effectively use and build an FTS index (we'll learn about Elasticsearch and MongoDB Atlas Search) as well as we'll see how you can do the same on Cosmocloud in just 5 minutes (with practical)! ????

Cosmocloud is the Backend-as-a-Service platform I am building, improving the developer's productivity by 8x, while still giving you full customisation and control on building complicated APIs and other Backend components. Check it out here.

Stay tuned, subscribe to this newsletter here, and if you liked the above content, please don't forget to Like ?? and Repost ?? with your Linkedin community ??


Keshav M.

Senior Product Security Engineer at LinkedIn | Supply Chain Security | Security Automation | Technical Content Writer

9 个月

amazing great!

回复
youssef kandil

Front end Diveloper (React)

9 个月

Thanks for sharing

回复
Ezekiel Kolawole

Software Engineer

9 个月

Insightful read!

milan khunt

Transforming Ideas into Scalable Tech Solutions | Python & SaaS Expert | Freelance Partner for Startups

9 个月

Amazing, morning coffee with FTS, ????

POOJA JAIN

Storyteller | Linkedin Top Voice 2024 | Senior Data Engineer@ Globant | Linkedin Learning Instructor | 2xGCP & AWS Certified | LICAP'2022

9 个月

An amazing post curated on Database Indexes vs Full text search! Shrey Batra

要查看或添加评论,请登录

Shrey Batra的更多文章

社区洞察

其他会员也浏览了