Why FAST-API??
FastAPI has rapidly gained popularity as a modern, fast (high-performance) web framework for building APIs with Python 3.7+, offering several compelling advantages over traditional frameworks like Flask + Gunicorn, particularly for applications requiring asynchronous processing, such as machine learning model serving. Here's why you should consider using FastAPI for your next project instead of the Flask + Gunicorn combination:
1. Built-in Asynchronous Support
FastAPI is designed from the ground up to support asynchronous request handling, making it inherently more suitable for IO-bound and high-concurrency applications. This is a critical advantage when serving machine learning models or handling any other tasks that involve waiting for IO operations, such as database queries or network requests. The asynchronous support in FastAPI allows for non-blocking request processing, which can significantly improve the performance and scalability of your applications.
2. Automatic Data Validation and Serialization
FastAPI leverages Pydantic and type hints to automatically validate incoming data and serialize outgoing data. This means you can define your data models using standard Python type hints, and FastAPI will handle the validation for you. It saves a significant amount of boilerplate code, reduces the risk of errors, and ensures that your API is strictly typed. This feature alone can lead to more robust and secure applications compared to Flask, where you might need additional libraries like Marshmallow to achieve similar functionality.
3. OpenAPI and Swagger Integration
FastAPI automatically generates documentation for your API using the OpenAPI standard. This is accessible through a built-in Swagger UI, making it easy for developers to test and interact with your API. This feature is incredibly useful for both development and production debugging, as it provides a clear, interactive interface for all your API endpoints. Flask can support OpenAPI and Swagger, but it requires additional extensions and configuration.
4. Performance
FastAPI is built on Starlette for the web parts and uses Pydantic for the data parts, making it one of the fastest Python web frameworks available. In benchmarks, FastAPI's performance is comparable to NodeJS and Go, thanks to its asynchronous support and efficient parsing of request and response data. Flask, while flexible and lightweight, does not offer the same level of performance out-of-the-box, especially in high-concurrency scenarios.
5. Modern Python Features
FastAPI encourages the use of modern Python features such as type hints, async/await, and data classes. This not only makes your code cleaner and more expressive but also improves development speed and reduces the chance of bugs. Flask is more unopinionated in this regard, allowing for greater flexibility but at the cost of potentially less structured and slower-to-develop code.
6. Simplified Concurrency Handling
The asynchronous nature of FastAPI simplifies the handling of concurrent operations, making it easier to write non-blocking code that's both efficient and easy to understand. With Flask and Gunicorn, achieving the same level of concurrency often requires a combination of third-party libraries and additional setup, such as using Gevent or Eventlet, which can complicate the development and deployment process.
Let's explore more about the differences between sync and async tasks.
领英推荐
The big change with FastAPI is it uses an event loop for running tasks. If you're not careful, a task that takes too long can slow everything down.
Look at this simple code example:
import asyncio
import time
from fastapi import FastAPI, Request
from sentence_transformers import SentenceTransformer
app = FastAPI()
sbertmodel = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
def model_predict():
return sbertmodel.encode('How big is London')
async def vector_search(vector):
# Pretend this is a call to a database or another service
await asyncio.sleep(0.005)
@app.get("/")
async def entrypoint(request: Request):
ts = time.time()
vector = model_predict()
print(f"Model : {int((time.time() - ts) * 1000)}ms")
ts = time.time()
await vector_search(vector)
print(f"io task: {int((time.time() - ts) * 1000)}ms")
return "ok"
Here, we're asking a BERT model for an embedding (a kind of data transformation). Then we do a simulated I/O task, like querying a database, which we fake with a short pause.
When you run one request, it seems fine. But what if many requests come in at once? The model still runs fast, but the I/O tasks start taking way longer because the model calculations block the event loop, causing a backlog.
To fix this, you can run the model in a separate space that doesn't block the event loop:
import asyncio
import time
from concurrent.futures import ThreadPoolExecutor
from fastapi import FastAPI, Request
from sentence_transformers import SentenceTransformer
# Setting up a pool for running tasks so they don't all try to run at once and slow things down
pool = ThreadPoolExecutor(max_workers=1)
app = FastAPI()
sbertmodel = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
def model_predict():
ts = time.time()
vector = sbertmodel.encode('How big is London')
print(f"Inner model : {int((time.time() - ts) * 1000)}ms")
return vector
async def vector_search(vector):
await asyncio.sleep(0.005)
@app.get("/")
async def entrypoint(request: Request):
loop = asyncio.get_event_loop()
ts = time.time()
vector = await loop.run_in_executor(pool, model_predict)
print(f"Model : {int((time.time() - ts) * 1000)}ms")
ts = time.time()
await vector_search(vector)
print(f"io task: {int((time.time() - ts) * 1000)}ms")
return "ok"
This code moves the model's work to a separate thread, preventing it from blocking the event loop. But using threads for machine learning models can slow things down because they're not good at handling CPU-heavy tasks like these. The solution? Use a process instead of a thread:
import asyncio
import time
from concurrent.futures import ProcessPoolExecutor
from fastapi import FastAPI, Request
from sentence_transformers import SentenceTransformer
app = FastAPI()
# This function will load the model in a separate process
def create_model():
global sbertmodel
sbertmodel = SentenceTransformer('multi-qa-MiniLM-L6-cos-v1')
pool = ProcessPoolExecutor(max_workers=1, initializer=create_model)
def model_predict():
ts = time.time()
vector = sbertmodel.encode('How big is London')
return vector
async def vector_search(vector):
await asyncio.sleep(0.005)
@app.get("/")
async def entrypoint(request: Request):
loop = asyncio.get_event_loop()
ts = time.time()
vector = await loop.run_in_executor(pool, model_predict)
print(f"Model : {int((time.time() - ts) * 1000)}ms")
ts = time.time()
await vector_search(vector)
print(f"io task: {int((time.time() - ts) * 1000)}ms")
return "ok"
This setup uses processes, which are better for CPU-heavy tasks. It keeps our API responsive by not letting model predictions block other tasks. The result is faster and more efficient handling of both model predictions and I/O tasks, even under heavy load.
Conclusion
FastAPI offers a compelling package for developing high-performance, asynchronous web applications and APIs in Python. Its automatic validation, serialization, and documentation, combined with its high performance and modern Python features, make it an excellent choice for new projects, especially those that require asynchronous processing or aim to take full advantage of modern Python features. While Flask + Gunicorn is a proven and reliable choice for many types of applications, FastAPI provides a more modern, efficient, and feature-rich alternative for building APIs and web applications, particularly where performance and scalability are concerned.