登录查看更多内容

Why not ORM's

Timilehin Aliyu

Software Engineer | FullStack Developer (Backend heavy)

发布日期: 2024年6月18日

It's no news that trends in tech often diverge due to performance concerns with certain languages or tools. Just as some developers claim that certain languages are slow, similar debates occur about databases. I've heard it numerous times: "Using ORMs is slow." In the Go ecosystem, for instance, some developers opt for SQLC over GORM for this reason.

Everyone has their reasons, but I've come to realize that the perceived slowness of ORMs is often due to how we use them. Sometimes, we neglect to dig deep into making certain types of queries efficient, or we're not even aware a particular query needs to be optimized, and other times we don't even realize that a particular query could be optimized. Most times "the deadline is the next day lol".

In Django, for instance, developers often shy away from helpers like prefetch_related and select_related. While these have their own advantages and disadvantages, I've realized that the ORM itself is not inherently slow, it's about how we use it, not only ORM but frameworks or languages perhaps.

Here's a quick example to illustrate this point. Recently, I worked with Adefemi Oseni , who identified an expensive operation and helped me understand how to optimize ORM queries properly.

Let's say we want to check the status of users running an agent (machine learning model) and identify those who have surpassed their usage time so we can clear or stop them. Here's an initial approach:

from django.utils import timezone
from django.db import models
from datetime import timedelta
import enum

class StatusChoices(enum.Enum):
    running = 'running'
    stopped = 'stopped'
    created = 'created'

def check_running_servers():
    # Get running projects
    running_servers = Project.objects.filter(status=StatusChoices.RUNNING)

    for running_server in running_servers:
        # Get the owner of the project and their timeout value
        time_duration = running_server.project.owner.test_server_timeout
        timeout_time = running_server.created_at + time_duration

        # Check if it has surpassed the current time
        if timeout_time < timezone.now():
            stop_test_server(running_server.id)
            send_email(
                "Agent Translation Task Timeout",
                f"Your test server for project {running_server.project.name} has timed out",
                running_server.project.owner.email
            )

Now, let's consider a scenario where we have 1 million records. Are we going to fetch the status for each user every time we call the check_running_status function? What if multiple checks are going on at the same time? Whew! we dont want that. But this looks normal we wouldn't even suspect the code above has any deficiency. This is one of the pitfalls we often fall into.

领英推荐

How Zyte API takes care of the fundamental needs of…

Zyte 1 年前

Build Your Own Customized Data Scraper: A…

David Funyi T. 3 周前

Is JavaScript the Future of Data Science? Exploring…

Diogo Ribeiro 5 个月前

However, most ORMs provide measures to handle scenarios like this efficiently. Let's refactor this code using Django's annotate to perform the necessary calculations in the database query itself:

from django.db.models import F, ExpressionWrapper, DurationField, DateTimeField

def check_running_servers():
    running_servers = RunningProject.objects.filter(
        status=StatusChoice.RUNNING
    ).annotate(
        timeout_duration=ExpressionWrapper(
            F('project__owner__test_server_timeout') * timedelta(seconds=1), 
            output_field=DurationField()
        ), 
        timeout_time=ExpressionWrapper(
            F('created_at') + F('timeout_duration'), 
            output_field=DateTimeField()
        )
    ).filter(timeout_time__lt=timezone.now())

    for running_server in running_servers:
        stop_test_server(running_server.id)
        send_email(
            "Test Server Timeout",
            f"Your test server for project {running_server.project.name} has timed out",
            running_server.project.owner.email
        )

Wow using an ORM provided helper and few othe django classes we perform all the calculations within the database, reducing the overhead on our application server and making the process more efficient.

Let's dig deep. What makes the code more efficient?

The refactored code is more efficient because it reduces the number of database round trips and avoids fetching unnecessary data. In the original implementation, you would have to fetch all RunningProject instances, then iterate over them to retrieve the project.owner.test_server_timeout value and calculate the timeout_time for each instance. This approach would result in multiple database queries (N+1 problem) and potentially fetch more data than necessary. But, In the refactored version, the calculations are performed within the database query itself, leveraging the power of the ORM and the database engine.

To an extent and depending on the application you're working on ORMs are powerful tools that can greatly simplify database operations and improve developer productivity. To get that efficiency you might want, it's essential to understand their underlying mechanisms, learn about the available optimization techniques, and continuously monitor and optimize your queries as your application grows. Recently, we have chatModels (chatGPT, claudeAI etc) that can provide insights and make such research fast. Use it as a guide or manual if google search isn't helping. The key is to explore and understand the tools at our disposal, ensuring we use them to their full potential. This way, we can avoid common pitfalls and create efficient, scalable applications.

要查看或添加评论，请登录

Timilehin Aliyu的更多文章

Progressive Web Apps: New FE systems

2024年12月23日

Progressive Web Apps: New FE systems

Sometimes Connectivity Becomes a Bottleneck. In enterprise environments, we often take stable internet connectivity as…

3 条评论
Handling SQLite DB in Lambda Functions Using Zappa

2024年6月10日

Handling SQLite DB in Lambda Functions Using Zappa

Using SQLite is rather straightforward for managing quick services, but when it gets scalable and a point of data…
Polling in React

2024年6月9日

Polling in React

In today's tech space of web development, real-time data updates are important for creating engaging and responsive…
Creating React Components with Style Variants using Class Variance Authority

2024年1月2日

Creating React Components with Style Variants using Class Variance Authority

With the evolving pace of React development, crafting visually appealing and also highly reusable components is…

1 条评论
Simplifying Database Migration with Django and API Endpoints

2023年12月18日

Simplifying Database Migration with Django and API Endpoints

Making abrupt changes to a live database can be overwhelming, even for experienced professionals. Luckily, there are…

5 条评论
Serverless Deployment with Zappa on AWS Lambda

2023年12月2日

Serverless Deployment with Zappa on AWS Lambda

Deploying web applications to serverless environments can be a daunting task, especially when faced with challenges…
LeetCode has positive Impacts on improving coding skills.

2023年9月30日

LeetCode has positive Impacts on improving coding skills.

Are you the type of person who occasionally wonders, "Why LeetCode?" I certainly was until I found myself facing a…
CHATGPT HAS NO IDEA ABOUT MY SOURCE OF TRUTH

2023年9月17日

CHATGPT HAS NO IDEA ABOUT MY SOURCE OF TRUTH

Latest trends in today's tech-driven world, artificial intelligence is at the forefront of innovation. One of the…

8 条评论
Seamless Server Updates: Keeping Your Website Up and Running with Gunicorn and Nginx

2023年8月17日

Seamless Server Updates: Keeping Your Website Up and Running with Gunicorn and Nginx

Lately, I've been working on Application Servers using Gunicorn and Nginx to serve Flask web pages and apps. Here are…

4 条评论

See all articles

Why not ORM's

Timilehin Aliyu

Software Engineer | FullStack Developer (Backend heavy)

领英推荐

Timilehin Aliyu的更多文章

社区洞察

其他会员也浏览了

Most Popular Scraping Libraries for 2023

Dynamic Web Scraping with Python, Pandas and DuckDB

Building a Highly Resilient and Scalable System for AI/ML Models using Python(Django), PostreSQL, and ReactJS.

Mastering the Art of Django ORM Optimization: Elevating Database Performance to New Heights

Developer Insights

GraphQL- Alternative to Rest API

Advanced ActiveRecord Queries: Unlocking the Full Potential of Rails ORM

RESTful Web API with Flask

Web Scraping 101: Understanding the Basics

Find the best Answer to your questions with GraphQL

领英推荐

Timilehin Aliyu的更多文章

Progressive Web Apps: New FE systems

Handling SQLite DB in Lambda Functions Using Zappa

Polling in React

Creating React Components with Style Variants using Class Variance Authority

Simplifying Database Migration with Django and API Endpoints

Serverless Deployment with Zappa on AWS Lambda

LeetCode has positive Impacts on improving coding skills.

CHATGPT HAS NO IDEA ABOUT MY SOURCE OF TRUTH

Seamless Server Updates: Keeping Your Website Up and Running with Gunicorn and Nginx

社区洞察

其他会员也浏览了

Most Popular Scraping Libraries for 2023

Dynamic Web Scraping with Python, Pandas and DuckDB

Building a Highly Resilient and Scalable System for AI/ML Models using Python(Django), PostreSQL, and ReactJS.

Mastering the Art of Django ORM Optimization: Elevating Database Performance to New Heights

Developer Insights

GraphQL- Alternative to Rest API

Advanced ActiveRecord Queries: Unlocking the Full Potential of Rails ORM

RESTful Web API with Flask

Web Scraping 101: Understanding the Basics

Find the best Answer to your questions with GraphQL