Full-Text Search with Django & PostgreSQL in Six minutes

Full-Text Search with Django & PostgreSQL in Six minutes

An introduction to full-text search with Django & PostgreSQL

Introduction

This article was originally published on my blog at: https://paulonteri.com/thoughts/full-text-search-django

Prerequisites

Be familiar with Django

The Django project we’ll be using

The project uses a PostgreSQL database:

# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases

DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql_psycopg2",
        "NAME": "myblog",
        "USER": "myblog",
        "PASSWORD": "myblog",
        "HOST": "localhost",
        "PORT":  5432,
    }
}        

And has one app called?posts?with one model named?Post:

settings.py
INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'posts', # <-- posts app
]
        

...

from django.db import models

class Post(models.Model):
    title = models.CharField(max_length=50, unique=True)
    description = models.CharField(max_length=255)
    content = models.TextField()

    def __str__(self):
        return f"Title: `{self.title}` | Description: `{self.description}`"        

The database has two posts:

[
   {
      "title": "Use Your Domain with Gmail",
      "description": "Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace)"
   },

   {
      "title": "Set Up Your Own Heroku-like Cloud",
      "description": "When building new projects, it’s usually annoying to   repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use"
   }
]        

Find that project on my GitHub:?https://github.com/paulonteri/django-rich-text-search-blog-post-code

Basic/traditional search

When it comes to search, with Django, you'll typically start by performing search queries with?contains or?contains for exact matches.

The above search pattern is what most developers are likely familiar with, which has been part of the SQL standard from the beginning, and is available to all SQL-based databases, as follows:

SELECT column_name FROM table_name WHERE column_name LIKE 'pattern';        

That will return the rows where?column_name?matches the?pattern, the closest way to translate that SQL statement to the Django ORM is as follows:

>>> from posts.models import Post
>>> Post.objects.filter(title__contains="Gmail")

<QuerySet [<Post: Title: `Use Your Domain with Gmail` | Description: `Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace).`>]>        

For advanced use cases, the?Q object?can be used as well to add AND (&) or OR (|) logical operators.

Below, we used the?filter?method to filter against the?title?or?description?fields. Furthermore, we also used the?icontains?extension to check if the query is present in the?title?or?description?fields (case insensitive). A positive result will be returned if a match is found.

from django.db.models import Q
from posts.models import Post

Post.objects.filter(
	Q(title__icontains=query) | Q(description__icontains=query)
)        

Searching with basic search

Start the Python interactive interpreter with the command?python manage.py shell

No alt text provided for this image

Create a?basic_search()?function:

No alt text provided for this image

Let’s search for the term ‘heroku’:

No alt text provided for this image

It returns one result.

...

Let’s search for the term ‘paas’:

No alt text provided for this image

It returns one result.

...

Let’s search for the term ‘heroku paas’:

No alt text provided for this image

We quickly run into problems as we can’t find results for that search term even though we have a post with both ‘heroku’ and ‘paas’ as shown below:

No alt text provided for this image

...

Let’s search for the term ‘heroku cloud’:

No alt text provided for this image

We also can’t find results for that search term even though we have a post with both ‘heroku’ and ‘cloud’ in the same model field?title?as shown below:

No alt text provided for this image

The above problems are some of the reasons why we use rich text search.

Full-text search

What is full-text search?

A full-text search is a comprehensive search method that?compares every word of the search request against every word within the document or database. Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.

It searches some text inside extensive text data and returns results that contain some or all of the words from the query. In contrast, the traditional search would return?exact matches.

Introduction to full-text search with Django & PostgreSQL

The module?django.contrib.postgres?contains the support for Full-Text Search in Django. Add it to your?[INSTALLED_APPS]:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'django.contrib.postgres', # <-- Add this line
    'posts',
]        

Searching a Single field with the?search?lookup

The simplest way to start using Full-Text Search in Django is by using?search lookup. To search the?title?column on the?Post?model, append?__search?to the column name when filtering the model. Let’s search for the term ‘heroku cloud’ that failed to work using the basic search:

>>> Post.objects.filter(title__icontains="heroku cloud")
<QuerySet []>

>>> Post.objects.filter(title__search="heroku cloud")
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>        
No alt text provided for this image

Under the hood, Django converts the?title?field to a?tsvector?and converts the search term to a?tsquery. (Note that you don’t have to know what these are. If you want to learn about them, check out the?PostgreSQL docs) You can check the underlying query to verify this:

>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\')) @@ plainto_tsquery(\'heroku cloud\') LIMIT 21'        

Searching across multiple fields with a?SearchVector

Searching against a single field is great but rather limiting. We can use a?SearchVector?to query against multiple fields.

Let’s search for the term ‘cloud projects’ where ‘cloud’ is in the?title?field and ‘projects’ is in the?description?field:

>>> from django.contrib.postgres.search import SearchVector
>>> from django.db.models import Q
>>> from posts.models import Post

>>> query = "cloud projects"

>>> Post.objects.filter(Q(title__icontains=query) | Q(description__icontains=query))
<QuerySet []>

>>> Post.objects.annotate(search=SearchVector("title", "description")).filter(search=query)
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
        
No alt text provided for this image

We can see how rich-text search can come in handy.

The underlying query Django’s ORM generated uses?to_tsvectorto query both the?title??and?description fields in the database (again, you don’t have to know what these are):

>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content", to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) AS "search" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) @@ plainto_tsquery(\'cloud projects\') LIMIT 21'        

And more…

We can decide to order results by relevance using?SearchRank, highlighted search results with?SearchHeadline, changing the search configuration, weighting queries, etc all without wiring SQL. Learn more in thdocs: ?https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/search/

Conclusion

In this quick guide, we saw the basics of adding full-text search to a Django application.

See the complete code from the?GitHub repo.

Sources and more reading:

Dr Raymond O OIGARA

Dean of the School of Health Sciences at Kisii University

2 年

Great strides Son. Even though i don't understand soft ware issues I can deduce from the comments above that it is an interesting piece

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了