Full-Text Search with Django & PostgreSQL in Six minutes
An introduction to full-text search with Django & PostgreSQL
Introduction
This article was originally published on my blog at: https://paulonteri.com/thoughts/full-text-search-django
Prerequisites
Be familiar with Django
The Django project we’ll be using
The project uses a PostgreSQL database:
# Database
# https://docs.djangoproject.com/en/3.2/ref/settings/#databases
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql_psycopg2",
"NAME": "myblog",
"USER": "myblog",
"PASSWORD": "myblog",
"HOST": "localhost",
"PORT": 5432,
}
}
And has one app called?posts?with one model named?Post:
settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'posts', # <-- posts app
]
...
from django.db import models
class Post(models.Model):
title = models.CharField(max_length=50, unique=True)
description = models.CharField(max_length=255)
content = models.TextField()
def __str__(self):
return f"Title: `{self.title}` | Description: `{self.description}`"
The database has two posts:
[
{
"title": "Use Your Domain with Gmail",
"description": "Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace)"
},
{
"title": "Set Up Your Own Heroku-like Cloud",
"description": "When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use"
}
]
Find that project on my GitHub:?https://github.com/paulonteri/django-rich-text-search-blog-post-code
Basic/traditional search
When it comes to search, with Django, you'll typically start by performing search queries with?contains or?contains for exact matches.
The above search pattern is what most developers are likely familiar with, which has been part of the SQL standard from the beginning, and is available to all SQL-based databases, as follows:
SELECT column_name FROM table_name WHERE column_name LIKE 'pattern';
That will return the rows where?column_name?matches the?pattern, the closest way to translate that SQL statement to the Django ORM is as follows:
>>> from posts.models import Post
>>> Post.objects.filter(title__contains="Gmail")
<QuerySet [<Post: Title: `Use Your Domain with Gmail` | Description: `Send and receive emails from a custom domain in Gmail for free (without having to pay for G Suite / Google Workspace).`>]>
For advanced use cases, the?Q object?can be used as well to add AND (&) or OR (|) logical operators.
Below, we used the?filter?method to filter against the?title?or?description?fields. Furthermore, we also used the?icontains?extension to check if the query is present in the?title?or?description?fields (case insensitive). A positive result will be returned if a match is found.
from django.db.models import Q
from posts.models import Post
Post.objects.filter(
Q(title__icontains=query) | Q(description__icontains=query)
)
Searching with basic search
Start the Python interactive interpreter with the command?python manage.py shell
Create a?basic_search()?function:
Let’s search for the term ‘heroku’:
It returns one result.
...
Let’s search for the term ‘paas’:
It returns one result.
...
Let’s search for the term ‘heroku paas’:
领英推荐
We quickly run into problems as we can’t find results for that search term even though we have a post with both ‘heroku’ and ‘paas’ as shown below:
...
Let’s search for the term ‘heroku cloud’:
We also can’t find results for that search term even though we have a post with both ‘heroku’ and ‘cloud’ in the same model field?title?as shown below:
The above problems are some of the reasons why we use rich text search.
Full-text search
What is full-text search?
A full-text search is a comprehensive search method that?compares every word of the search request against every word within the document or database. Web search engines and document editing software make extensive use of the full-text search technique in functions for searching a text database stored on the Web or on the local drive of a computer; it lets the user find a word or phrase anywhere within the database or document.
It searches some text inside extensive text data and returns results that contain some or all of the words from the query. In contrast, the traditional search would return?exact matches.
Introduction to full-text search with Django & PostgreSQL
The module?django.contrib.postgres?contains the support for Full-Text Search in Django. Add it to your?[INSTALLED_APPS]:
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'django.contrib.postgres', # <-- Add this line
'posts',
]
Searching a Single field with the?search?lookup
The simplest way to start using Full-Text Search in Django is by using?search lookup. To search the?title?column on the?Post?model, append?__search?to the column name when filtering the model. Let’s search for the term ‘heroku cloud’ that failed to work using the basic search:
>>> Post.objects.filter(title__icontains="heroku cloud")
<QuerySet []>
>>> Post.objects.filter(title__search="heroku cloud")
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
Under the hood, Django converts the?title?field to a?tsvector?and converts the search term to a?tsquery. (Note that you don’t have to know what these are. If you want to learn about them, check out the?PostgreSQL docs) You can check the underlying query to verify this:
>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\')) @@ plainto_tsquery(\'heroku cloud\') LIMIT 21'
Searching across multiple fields with a?SearchVector
Searching against a single field is great but rather limiting. We can use a?SearchVector?to query against multiple fields.
Let’s search for the term ‘cloud projects’ where ‘cloud’ is in the?title?field and ‘projects’ is in the?description?field:
>>> from django.contrib.postgres.search import SearchVector
>>> from django.db.models import Q
>>> from posts.models import Post
>>> query = "cloud projects"
>>> Post.objects.filter(Q(title__icontains=query) | Q(description__icontains=query))
<QuerySet []>
>>> Post.objects.annotate(search=SearchVector("title", "description")).filter(search=query)
<QuerySet [<Post: Title: `Set Up Your Own Heroku-like Cloud` | Description: `When building new projects, it’s usually annoying to repeatedly set up stuff like web servers, databases, etc. It’s often expensive or tedious to set up and manage. That’s where open-source PAASs came in. They are free and often have an easy to use`>]>
We can see how rich-text search can come in handy.
The underlying query Django’s ORM generated uses?to_tsvectorto query both the?title??and?description fields in the database (again, you don’t have to know what these are):
>>> from django.db import connection
>>> connection.queries[-1]['sql']
'SELECT "posts_post"."id", "posts_post"."title", "posts_post"."description", "posts_post"."content", to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) AS "search" FROM "posts_post" WHERE to_tsvector(COALESCE("posts_post"."title", \'\') || \' \' || COALESCE("posts_post"."description", \'\')) @@ plainto_tsquery(\'cloud projects\') LIMIT 21'
And more…
We can decide to order results by relevance using?SearchRank, highlighted search results with?SearchHeadline, changing the search configuration, weighting queries, etc all without wiring SQL. Learn more in thdocs: ?https://docs.djangoproject.com/en/4.1/ref/contrib/postgres/search/
Conclusion
In this quick guide, we saw the basics of adding full-text search to a Django application.
See the complete code from the?GitHub repo.
Sources and more reading:
Dean of the School of Health Sciences at Kisii University
2 年Great strides Son. Even though i don't understand soft ware issues I can deduce from the comments above that it is an interesting piece