FastAPI: async def vs def. Performance comparison
By Yaroslav K.
What is FastAPI?
According to the official FastAPI webpage, it is a modern, fast (high-performance), web framework for building APIs with Python 3.8+ based on standard Python type hints.
The key features are:
Within the scope of this article, I would like to take a closer look at the very first bullet from the features listed above.
FastAPI claims to be a highly performant framework comparing itself with NodeJS and Go, and provides a link to the tech empower benchmarks web page as evidence of that.
I’m not sure about the reasoning to reference those two languages, but maybe it’s related to the asynchronous code execution feature which is present there as well.
According to the data from that particular benchmark results repository by the “Composite Framework Scores,” FastAPI took:
According to the latest available ranking (2023), it’s hard to say that FastAPI is still on par even with NodeJS and Go frameworks, because as it could be seen from the picture - Just framework which is built for JS is far away ahead, as well as a bunch of the other frameworks built for JS and Go langues.
But hey, let’s stay positive and focus on the good news - FastAPI is still a pretty decent web development framework and one of the fastest among those that are built for Python.
What is async and why does it matter?
FastAPI documentation contains a very comprehensive explanation of what asynchronous code execution is, how it works, and how it could potentially benefit Python application performance. I didn’t plan to deep dive into theoretical details in this article, so, if anyone is interested in a deeper understanding of this topic they could resort to the mentioned source, or conduct their research on this topic.
This documentation mentions two paradigms related to the way code could be executed:
According to Oracle’s “Defining Multithreading Terms”:
FastAPI documentation equalizes concurrency and asynchronous code execution. Also, as far as I know, it’s pretty possible to implement concurrent (asynchronous) code execution utilizing only one thread from Python's thread pool, what FastAPI is doing when path operation functions are declared with async def.
As per the mentioned docs:
Is concurrency better than parallelism?
Nope! That's not the moral of the story.
Concurrency is different than parallelism. And it is better for specific scenarios that involve a lot of waiting. Because of that, it generally is a lot better than parallelism for web application development. But not for everything.
Concurrency + Parallelism With FastAPI you can take advantage of concurrency which is very common for web development (the same main attraction of NodeJS).
But you can also exploit the benefits of parallelism and multiprocessing (having multiple processes running in parallel) for CPU-bound workloads like those in Machine Learning systems.
So, to put it short and simply, the FastAPI application which is run on the asynchronous but single-threaded unicorn ASGI web server could be also parallelized with the help of the gunicorn WSGI multi-threaded process manager which will manage the desired quantity of separate unicorn workers in parallel. In theory, this should boost the productivity of the web application built and deployed this way compared to the non-concurrent and single-thread one.
To check if this is true we would need to build and deploy several applications (or their versions) with the same functionality combining in different ways synchronous/asynchronous and single-threaded/multi-threaded execution.??
Because this article is only related to the FastAPI while working on it I was able to experiment only with different ways FastAPI does the asynchronous execution of the code (the way path operation functions are declared), as well as to play a little bit with the parallelization. So no fully synced implementation will be analyzed/compared here.
The simplified guidance for the path operation functions declaration from the FastAPI looks like this:
If you are using third-party libraries that tell you to call them with await, like:
results = await some_library()
Then, declare your path operation functions with async def like:
@app.get('/')
async def read_results():
results = await some_library()
return results
If you are using a third-party library that communicates with something (a database, an API, the file system, etc.) and doesn't have support for using await, (this is currently the case for most database libraries), then declare your path operation functions as normally, with just def, like:
@app.get('/')
def results():
results = some_library()
return results
If your application (somehow) doesn't have to communicate with anything else and wait for it to respond, use async def.
If you just don't know, use normal def.
Note: You can mix def and async def in your path operation functions as much as you need and define each using the best option. FastAPI will do the right thing with them.
Anyway, in any of the cases above, FastAPI will still work asynchronously and be extremely fast.
But by following the steps above, it will be able to do some performance optimizations.
Additional information on this topic from the same source:
When you declare a path operation function with normal def instead of async def, it is run in an external thread pool that is then awaited, instead of being called directly (as it would block the server).
If you are coming from another async framework that does not work in the way described above and you are used to defining trivial compute-only path operation functions with plain def for a tiny performance gain (about 100 nanoseconds), please note that in FastAPI the effect would be quite opposite. In these cases, it's better to use async def unless your path operation functions use code that performs blocking I/O.
Still, in both situations, chances are that FastAPI will still be faster than (or at least comparable to) your previous framework.
The same applies to dependencies. If a dependency is a standard def function instead of async def, it is run in the external thread pool.
Let’s experiment with these approaches on our own in the next section.
Building basic Python web applications with FastAPI, PostgreSQL, and Docker
To find out which approach of declaring path operation functions works best in which situations I will create a very basic web application using FastAPI (0.109.0) as the main framework, as well as SQLAlchemy (2.0.25) as an ORM, and PostgreSQL (14.2) as DBMS. All of this will be containerized with the help of Docker (platform Linux/amd64).
Let’s start. To evaluate interaction with our web server that doesn’t involve any kind of dependency calls, let's declare a simple route that could be accessed with the GET method, and just returns vanilla {“Hello”:? “world”} JSON.
The second route involves interaction with the database and can be accessed with the help of the PUT method.?
Initially, I planned to go with POST, but creating each team a new resource negatively affected load testing results aggregation and comprehension of an output of the load testing framework.
The PUT test route will utilize a repository pattern, and with the help of SQLAlchemy will first try to fetch the resource from the PostgreSQL database, and only if it’s missing then do insert (in other words conducting UPSERT operation, but split into two separate repository functions).
To test how async declaration of path operation and dependency functions influences application performance I created all combinations of them. The needed combination will be automatically chosen based on the environment variables values. More about this in the next section.
I won’t provide step-by-step instructions on how I came to the final code solution (it was a long process), and just share with you what I got before switching to the Load testing section.
Test router implementations:
from os import environ as env
from app.controllers.models.tests import TestRowRequest, TestRowResponse
from app. repositories. tests import async_insert_test_row, insert_test_row
from fast API import APIRouter
test_router = APIRouter(prefix='/tests', tags=['Test router'])
STATIC_RESPONSE = {"Hello": "World"}
match env['API_MODE']:
case 'ASYNC':
@test_router.put('/', response_model=TestRowRequest, status_code=201)
async def add_test_row(request: TestRowRequest) -> TestRowResponse:
match env['ENGINE_MODE']:
case 'ASYNC':
from app.infra.db.engine import async_engine
async with async_engine.begin() as conn:
response = await async_insert_test_row(request.id, request.text, conn)
case 'SYNC' | _:
from the app.infra.db.engine import sync_engine
with sync_engine.begin() as conn:
response = insert_test_row(request.id, request.text, conn)
return response
@test_router.get("/")
async def get_test():
return STATIC_RESPONSE
case 'SYNC' | _:
from the app.infra.db.engine import sync_engine
@test_router.put('/', response_model=TestRowRequest, status_code=201)
def add_test_row(request: TestRowRequest) -> TestRowResponse:
with sync_engine.begin() as conn:
response = insert_test_row(request.id, request.text, conn)
return response
@test_router.get("/")
def get_test():
return STATIC_RESPONSE
Repositories implementation:
from uuid import UUID
from app.controllers.models.tests import TestRowResponse
from app.infra.db.schema import tests
from fast API import HTTPException
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.engine import Connection
from sqlalchemy. ext. asyncio import AsyncConnection
def get(id: UUID, conn: Connection) -> TestRowResponse:
if test := conn.execute(tests.select().where(tests.c.id == id)).first():
return TestRowResponse(**test._asdict())
else:
raise HTTPException(status_code=404, detail=f'Test row with ID: {id} is missing')
def insert_test_row(id: UUID, text: str, conn: Connection) -> TestRowResponse:
conn.execute(insert(tests).values(id=id, text=text))
result = get(id, conn)
return result
async def async_get(id: UUID, conn: AsyncConnection) -> TestRowResponse:
test_row = await conn.execute(tests.select().where(tests.c.id == id))
if test := test_row.first():
return TestRowResponse(**test._asdict())
else:
raise HTTPException(status_code=404, detail=f'Test row with ID: {id} is missing')
async def async_insert_test_row(id: UUID, text: str, conn: AsyncConnection) -> TestRowResponse:
await conn.execute(insert(tests).values(id=id, text=text))
result = await async_get(id, conn)
return result
Engine implementation:
from os, import environ as env
from sqlalchemy import MetaData, create_engine
from sqlalchemy. ext. asyncio import create_async_engine
USER, PASSWORD, HOST, PORT, DATABASE = \
env['PGUSER'], env['PGPASSWORD'], env['PGHOST'], env['PGPORT'], env['PGDATABASE']
POOL_SIZE = int(env['POOL_SIZE'])
DB_URL = f"postgresql+asyncpg://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}"
match env['ENGINE_MODE']:
case 'ASYNC':
DB_URL = f"postgresql+asyncpg://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}"
async_engine = create_async_engine(url=DB_URL, future=True, pool_size=POOL_SIZE)
case 'SYNC' | _:
DB_URL = f"postgresql://{USER}:{PASSWORD}@{HOST}:{PORT}/{DATABASE}"
sync_engine = create_engine(url=DB_URL, future=True, pool_size=POOL_SIZE)
metadata = MetaData()
As you can see, to create an async connection with PostgreSQL you need to utilize an asyncpg driver. I used the 0.29.0 version during the testing.
Also, I parametrized the pool_size argument of the create_engine function to get a better understanding of how this option influences the app’s performance.
Note: There is a very good article by Thomas Aitken called “Setting up a FastAPI App with Async SQLALchemy 2.0 & Pydantic V2” which provides a comprehensive guide on how to configure an async engine for the FastAPI application, which I also was using as a reference while working on this one.
Load testing
First of all, I should mention that all testing happened on one machine (Macbook Pro M1 with 10 cores CPU and 32 GB of RAM). This included spinning up the FastAPI application with the help of docker-compose and running the testing framework against it. As the goal of the experiment was not to check an absolute maximum of the framework, but rather to compare different configurations, I think the results should still be representative enough.
To test our tiny application I will use the Locust framework (official web page).
I used a pretty basic master/worker setup utilizing different quantities of workers for different test case scenarios.
To test how the application behaves with multiple unicorn workers under the gunicorn process manager and without it, I also tested those two different cases.
This is how my two test cases are defined:
from __future__ import annotations
import random
import string
from uuid import uuid4
from locust import HttpUser, tag, task
def random_string(k: int = 1000) -> str:
return ''.join(random.choices(string.ascii_lowercase + string.digits, k=k))
class TestAppTasks(HttpUser):
@tag('Put test row')
@task
def put_test_row(self: TestAppTasks) -> None:
self. client.put(
url='tests', json={
'id': str(uuid4()),
'text': random_string()
})
2. locustfile_get.py
from __future__ import annotations
from locust import HttpUser, tag, task
class TestAppTasks(HttpUser):
@tag('Get hello')
@task
def post_test_row(self: TestAppTasks) -> None:
self.client.get(url='tests/')
Here are locust configs:
[master conf]
master = true
master-bind-host = 0.0.0.0
master-bind-port = 5557
expect-workers = 3
[run-time settings]
host = https://localhost:8000/
users = 1000
spawn-rate = 100
run-time = 2m
# locustfile = fastapi_app/app/tests/loadtests/locustfile_get.py
locust file = fastapi_app/app/tests/loadtests/locustfile_put.py
2. worker. conf
[worker conf]
worker = true
master-host=0.0.0.0
master-port=5557
# locustfile = fastapi_app/app/tests/loadtests/locustfile_get.py
locust file = fastapi_app/app/tests/loadtests/locustfile_put.py
As you can see there is nothing fancy to brag about, but it still does the job.
To run master and worker you need first start the master with locust --config master. conf, and then you need to start as many workers as configured in the master config with locust --config worker. conf.
I organized test case scenarios by the route and the HTTP method used to call it in the first place.
Note: All tests were conducted as a series of 3 consecutive runs 2 minutes each. Unfortunately, when Locust exports the data of tests through the report feature, it exports only the latest one. So all results presented below represent the 3rd run of the load test which almost in all cases showed slightly worse results compared to the previous two of them, especially when the DB has been involved. On another hand, I believe the results are still representative because we always compare the 3rd runs with each other.
PUT test results:
Requests per second (RPS) - 371.3
Response time (95th percentile) - 350 ms (milliseconds)
Response time (99th percentile) - 390 ms (milliseconds)
Requests per second (RPS) - 530.7
Response time (95th percentile) - 220 ms (milliseconds)
Response time (99th percentile) - 240 ms (milliseconds)
Requests per second (RPS) - 518.1
Response time (95th percentile) - 230 ms (milliseconds)
Response time (99th percentile) - 240 ms (milliseconds)
Requests per second (RPS) - 475.5
Response time (95th percentile) - 240 ms (milliseconds)
Response time (99th percentile) - 260 ms (milliseconds)
Requests per second (RPS) - 437.8
Response time (95th percentile) - 460 ms (milliseconds)
Response time (99th percentile) - 630 ms (milliseconds)
Requests per second (RPS) - 651.1
Response time (95th percentile) - 180 ms (milliseconds)
Response time (99th percentile) - 190 ms (milliseconds)
Intermediate summary
The best result with a clear difference shows round VI - the path operation function declared as async one, which relies on the asynchronous DB engine and repository functions. An interesting fact is that this config only shines when the engine has a pretty high connection pool, otherwise, it shows the worst result (round V).
Requests per second (RPS) - 368.1
Response time (95th percentile) - 2900 ms (milliseconds)
Response time (99th percentile) - 3000 ms (milliseconds)
Requests per second (RPS) - 477.2
Response time (95th percentile) - 2200 ms (milliseconds)
Response time (99th percentile) - 2200 ms (milliseconds)
Requests per second (RPS) - 474.6
Response time (95th percentile) - 2300 ms (milliseconds)
Response time (99th percentile) - 2400 ms (milliseconds)
Requests per second (RPS) - 463.7
Response time (95th percentile) - 2300 ms (milliseconds)
Response time (99th percentile) - 2500 ms (milliseconds)
Requests per second (RPS) - 388.5
Response time (95th percentile) - 5300 ms (milliseconds)
Response time (99th percentile) - 7700 ms (milliseconds)
Requests per second (RPS) - 525.3
Response time (95th percentile) - 2600 ms (milliseconds)
Response time (99th percentile) - 4300 ms (milliseconds)
Failures/s - 81.2
领英推荐
test-service-unicorn | asyncpg. exceptions.TooManyConnectionsError: sorry, too many clients already
Requests per second (RPS) - 559.6
Response time (95th percentile) - 3200 ms (milliseconds)
Response time (99th percentile) - 3500 ms (milliseconds)
Intermediate summary
These test case scenario results (with a 10 times larger user quantity) are not as straightforward as previous ones. First of all, we started observing application failures for the first time, which are related to the inability of the Postgres async driver to manage 100+ connections at the same time. Because of this (which I didn’t expect from the beginning), I have had to add an extra round VII of testing with a manageable 50 connections pool which didn’t bring any errors. On one hand, this last round wins in the RPS category. On another hand, it has far from the best 95th/99th percentile results compared to the configurations with a sync engine but has better results in the 50th-90th percentiles. It looks to me that this configuration from round VII is pretty efficient overall, but asynchronous managing of the high quantity of connections is a much more complex process, so it could spike the response time from time to time trying to keep them all in place. Visual confirmation of this can be found in Pictures presented below.
Requests per second (RPS) - 848.6
Response time (95th percentile) - 1700 ms (milliseconds)
Response time (99th percentile) - 1800 ms (milliseconds)
Requests per second (RPS) - 1104.6
Response time (95th percentile) - 1400 ms (milliseconds)
Response time (99th percentile) - 1500 ms (milliseconds)
Failures/s - 10.6
test-service-gunicorn? | sqlalchemy. etc.OperationalError: (psycopg2.OperationalError) connection to the server at "test-service-postgres" (172.21.0.2), port 5432 failed: FATAL:? sorry, too many clients already
Requests per second (RPS) - 1229.7
Response time (95th percentile) - 990 ms (milliseconds)
Response time (99th percentile) - 1100 ms (milliseconds)
Requests per second (RPS) - 1216.6
Response time (95th percentile) - 1000 ms (milliseconds)
Response time (99th percentile) - 1100 ms (milliseconds)
Requests per second (RPS) - 987.8
Response time (95th percentile) - 2400 ms (milliseconds)
Response time (99th percentile) - 3600 ms (milliseconds)
Requests per second (RPS) - 853.7
Response time (95th percentile) - 2900 ms (milliseconds)
Response time (99th percentile) - 5700 ms (milliseconds)
Failures/s - 476.5
test-service-gunicorn? | asyncpg. exceptions.TooManyConnectionsError: sorry, too many clients already
Requests per second (RPS) - 978.3
Response time (95th percentile) - 1600 ms (milliseconds)
Response time (99th percentile) - 2300 ms (milliseconds)
Failures/s - 244.0
test-service-gunicorn | asyncpg. exceptions.TooManyConnectionsError: sorry, too many clients already
Requests per second (RPS) - 1217.1
Response time (95th percentile) - 1200 ms (milliseconds)
Response time (99th percentile) - 1800 ms (milliseconds)
Intermediate summary
In this case, we managed to generate a connection bottleneck and thus get failures even from the sync engine due to a high quantity of unicorn workers who simultaneously tried to access the DB. According to PostgreSQL documentation (also I rechecked this personally within the Docker container with SHOW max_connections) the maximum number of connections that one PostgreSQL process could manage is 100 (default value). So, when you configure your multi-pod (one pod with several unicorn workers) application runtime, you should take that into account to avoid having a “too many clients” issue. Also, you should consider that there is a max_overflow engine parameter that could open up to 10 extra connections as per the default value. To address this issue I went with the value of 15 for the connection pool (just used a magic number at that moment and it helped).?
Even after the issue had been addressed in the VII round the result with the async engine was not better, especially regarding the 99th percentile category compared to the setup that didn’t utilize the async engine. Overall, it seems like the best result when we deal with the multiple unicorn workers trying to access the DB is the path operation function declared with async, and sync engine. I don’t see a strong connection with several connections in the pool and RPS/Response time, so, probably it’s better just to keep the configuration in the range of safe values to avoid the mentioned error.
Note: Later I retested the same failing test round II with a value of connections in the pool equal to 23 plus 10 extra of them coming from the overflow’s default value per worker. With 3 workers I got 99 potential connections to the DB, which is the closest value to the maximum number of connections in this situation. And with such configuration the test went without any errors.
Requests per second (RPS) - 1283.5
Response time (95th percentile) - 1000 ms (milliseconds)
Response time (99th percentile) - 1200 ms (milliseconds)
Kind of intermediate summary
I decided to add this additional case when I realized that Locust didn’t export all 3 runs just to demonstrate how I did the consecutive testing. Just for an experiment, I added 2 more locust workers, but as you can see there is not too much extra they brought compared to the similar configuration from the previous test case scenario (Round III) which has slightly lower RPS, but slightly better 95th/99th percentile values.
GET test results:
Requests per second (RPS) - 2209.4
Response time (95th percentile) - 61 ms (milliseconds)
Response time (99th percentile) - 71 ms (milliseconds)
Requests per second (RPS) - 3251.2
Response time (95th percentile) - 34 ms (milliseconds)
Response time (99th percentile) - 41 ms (milliseconds)
Intermediate summary
For the route that doesn’t have any dependency calls and static content, there is a huge gain both in RPS and the response time when it’s declared with async.
Requests per second (RPS) - 5299.1
Response time (95th percentile) - 260 ms (milliseconds)
Response time (99th percentile) - 310 ms (milliseconds)
Requests per second (RPS) - 9338.0
Response time (95th percentile) - 130 ms (milliseconds)
Response time (99th percentile) - 250 ms (milliseconds)
Intermediate summary
With the increased number of users and unicorn processes, the advantage of async declaration of the path operation functions pops up even stronger than in the previous test case scenario.
3 unicorn processes with a unicorn process manager. 3 locust workers running for 2 minutes with 3000 users spawn rate 100.
Requests per second (RPS) - 5107.0
Response time (95th percentile) - 760 ms (milliseconds)
Response time (99th percentile) - 850 ms (milliseconds)
Requests per second (RPS) - 8032.7
Response time (95th percentile) - 460 ms (milliseconds)
Response time (99th percentile) - 690 ms (milliseconds)
Intermediate summary
The same trend as in the previous test case scenario, but also it’s seen that with the increased number of users, the response time went several times bigger, and the RPS dropped by about 15%. I’m not sure what specifically bottlenecks the response serving: either too high CPU pressure on my local machine, FastAPI/Locust limitations, or everything together.
Conclusions
In this section, I will try my best to summarize all the data and share with you my thoughts on how it corresponds to FastAPI’s guide for choosing between the async and regular path operations functions declaration, as well as their dependencies.
NOTE: To make the data comprehension a bit easier I visualized load testing results with the help of diagrams. Results with failures were excluded from the charts to make the comparison fair.
Part 1.
PUT requests aggregated results:
The first step of the guide says:
If you are using third-party libraries that tell you to call them with await, then, declare your path operation functions with async def.
The second one says:
If you are using a third-party library that communicates with something (a database, an API, the file system, etc.) and doesn't have support for using await, (this is currently the case for most database libraries), then declare your path operation functions as normally, with just def.
And finally regarding dependencies:
You can have multiple dependencies and sub-dependencies requiring each other (as parameters of the function definitions), some might be created with async def and some with normal def. It would still work, and the ones created with normal def would be called on an external thread (from the thread pool) instead of being "awaited".
In the case of our tiny application, we have SQLAlchemy as the only route dependency that interacts with the PostgreSQL database. The SQLAlchemy engine which is responsible for providing such interaction could be declared in sync and async ways, though providing us the possibility to test both hints from above.
As only the PUT tests endpoint possesses this dependency I think it makes sense to reference only data from the load tests that utilize it.
With a relatively small quantity of “users” trying to access your application’s endpoint (built with the help of FastAPI) which is running as a single process, you will probably not see a huge difference between whether you declare your path operation function and the engine with sync or async def. The response time for each request didn’t exceed 1 second in my case (no network lag factor), and was about less than a quarter of a second in most cases. The most important part of the configuration happened to be the one that enables a connection pool big enough to sustain the data flow between your app and DB.
The best result was achieved with an async route and an async engine with a declaration, with a big enough connection pool (don’t make it too big to avoid application failure due to the “too many clients already” error). On the other hand, one of the worst results has the same declaration pattern, but with a lack of available connections in the engine pool.
Ten times increased quantity of “users” doing their simultaneous calls of the route somewhat decreased the RPS value, and the response time “skyrocketed” in the negative direction. It took about 10 times more time for the response for the similar configurations. Increasing the number of processes running our application by 3 times helped to mitigate this slowdown to some point (about two times both in RPS and response time). But in this case, you should even more wisely configure the engine connection pool because more workers/pods means more concurrency. So, always take into account that the default maximum connection pool for the PostgreSQL will be 100, and an engine configuration operates two parameters that influence the maximum pool size available per each FastAPI application process. Otherwise, your application could start failing due to the “too many clients already” error.
In case of relatively high request pressure and the single FastAPI process, an async route with an async engine with enough connections in the pool still delivers the highest RPS value. However, the difference with other configurations is not so huge compared to the async route/ sync engine setup. When it comes to the multiple FastAPI process runtime this difference between 3 top configurations for achieved RPS value becomes even smaller. But in both cases, the async route/async engine shows not the best consistency in terms of the maximum response time.
Interestingly, providing more connections for the async route with the sync engine only decreases the RPS, but helps a lot in all other cases.
To summarize this part of the summary I would say that when your FastAPI application deals with an external dependency like the SQLAlchemy responsible for the interaction with your DB, you could expect a better RPS result if you declare your path operation function, as well as engine creation function with async, but only if the connection pool will be big enough. On the other hand, in some extreme cases, you could gain similar, or even slightly better results with the async route and sync engine declaration, achieving a more stable response time as a bonus.
Part 2.
GET requests aggregated results:
The third step of the guide suggests:
If your application (somehow) doesn't have to communicate with anything else and wait for it to respond, use async def.
In our case, only the GET tests endpoint has no external dependencies.
To be honest I think there is no need to go into some extra details. My only recommendation is just to follow FastAPI documentation’s suggestion, and you probably get up to a 100% boost in the RPS and a similar response time decrease.
Tags:?
#python_frameworks
#web_development
#fastapi
#concurrency
#asynchronous
#backend
By Yaroslav K.