登录查看更多内容

Performance Testing an ML-Serving API with Locust!

David Hundley

Staff Machine Learning Engineer at State Farm ? | AI/ML Blogger | Livecoding Streamer

发布日期: 2020年9月20日

Hello again, friends! Welcome back to another data science quick tip. Now, when it comes to the full spectrum of data science (discovery to production), this post definitely falls toward the end of the spectrum. In fact, some companies might recognize this as the job of a machine learning engineer rather than a data scientist. As a machine learning engineer myself, I can verify that’s definitely true for my situation.

Still, I‘m sure there are many data scientists out there who are responsible for deployment of their own machine learning models, and this post will hopefully shed some light on how to do easy performance testing with this neat tool called Locust.

Before we jump in to performance testing, let’s address the API itself for a quick second. Using a dummy model we created in one of our earlier posts, we’ll use Flask and Gunicorn to serve the model behind an API endpoint. A user will POST appropriate JSON data to the endpoint and receive back the expected prediction from the machine learning model. Now, this post isn’t about creating an ML-serving API, so I quickly created one for our purposes. In order to use it, all you need to do is download my code from GitHub, navigate to the “api” folder, and run the following command in your terminal:

bash run.sh

What this will do is spin up your API behind a Gunicorn server with five workers at localhost:5001. If you see this screen, you’re on the right track.

Keep that tab open in your terminal and open a new one. Just to verify that the API is actually working, I created a separate little test that will quickly run 2 observations through the API. In the same code repository, navigate on over to the “test_data” directory and run the following command:

  bash tests.sh

If the API is working as it should, you should see the following screen:

Alright, so we’re ready to move onto the meat of this post: performance testing! When a machine learning model is used in this API context in a production setting, it’s super important to make sure it can handle the proper load of requests. If you get too many users or too many requests, you could have some major problems. You don’t want to be the one to bring down production!

Fortunately, folks have made this nice tool called Locust (or Locust.io) in order to help with just this issue. At first, the code can look odd, but we’ll explain things simply here so you can get up and running in no time.

First things first, you probably need to install Locust on your machine for the first time. Simple enough to do! Just run the following pip command to download Locust from PyPi:

pip install locust

Alrighty, now we’re ready to build our Locustfile! The Locustfile is a simple Python script that we will invoke to fire up Locust and it’s super handy user interface. By default, the Locust command line tool will look for a file called “locustfile.py”, but you can truly name it whatever you want (as long as you specify it with the -f flag). Keeping things easy on ourselves, we’ll simply call ours that default locustfile.py. And here’s everything we’re going to put in it.

from locust import HttpUser, task, between
import json
# Loading the test JSON data
with open('test_data/test_1.json') as f:
    test_data = json.loads(f.read())
# Creating an API User class inheriting from Locust's HttpUser class
class APIUser(HttpUser):
    # Setting the host name and wait_time
    host = 'https://localhost:5001'
    wait_time = between(3, 5)
    # Defining the post task using the JSON test data
    @task()
    def predict_endpoint(self):
        self.client.post('/predict', json = test_data)

It’s a pretty small script, but it’ll do some powerful things for us! Now, the first time you see this syntax, it can be a little odd, so let’s break it down bit by bit so you understand what’s going on here. Starting off with this first bit…

from locust import HttpUser, task, between
import json
# Loading the test JSON data
with open('test_data/test_1.json') as f:
    test_data = json.loads(f.read())

We’re simply importing what we’ll need from Locust and JSON and loading in the test JSON data that I have already provided. So far, this is probably nothing you’re unfamiliar with. But here’s where things start to get a little tricky. We’ll go slow here.

# Creating an API User class inheriting from Locust's HttpUser class
class APIUser(HttpUser):

Alright, so you’re probably familiar with Python classes. This is creating a new class that inherits the stuff from the parent “HttpUser” class as created by Locust. I’m not going to go deeply into the attributes / methods of that class, but suffice to say, this is what Locust is going to use when we spin up the user interface here shortly.

Moving along…

    # Setting the host name and wait_time
    host = 'https://localhost:5001'
    wait_time = between(3, 5)

The host is probably pretty straightforward here: we’re simply providing the base URL that the API is currently being served on. (Recall that I have my Gunicorn script serving at localhost:5001.) The “wait_time” piece is probably new to you. In tandem with that “between()” method, this is noting how long Locust should wait before spawning additional users. “between()” goes by seconds, so in our example here, new users will spawn some time every 3 to 5 seconds.

And the last part of our script:

    # Defining the post task using the JSON test data
    @task()
    def predict_endpoint(self):
        self.client.post('/predict', json = test_data)

That “@task()” decorator is telling our APIUser class what action it needs to take when Locust fires up. You can actually have multiple tasks and even weight them appropriately, but that goes beyond our scope here. For our purposes, one task will do it. All we need our task to do is to call the “/predict” API endpoint and pass it the JSON test data we loaded in at the top of the script.

Now comes the fun part! With our API still running, open a new tab in your terminal. Navigate on over to the directory with locustfile.py in it and run the following command:

locust

Remember, Locust by default is looking for that locustfile.py file, so that’s why we don’t need to necessarily specify anything else in the command line. What you should see is something that looks like this.

What this is is noting is that Locust has started up a web user interface at a specific port number. In my case, you’ll notice the Locust UI is being served behind localhost:8089. Open up your browser of choice and navigate on over to there. You should be greeted with the following screen.

Actually, the number of users and spawn rate will be empty. In this example, what I’m specifying here is that I want to test a total of 100 users. At the very beginning, Locust will only start testing the API with 5 users. Then after every 3–5 seconds (which we specified as the wait_time in our script), Locust will add another 5 users until it hits the total of 100 users. Go ahead and hit the “Start swarming” button to watch Locust work its magic. This is the first screen you’ll be greeted with.

You can see at the point that when I took this screenshot, Locust had already capped out at the 100 users and that 1866 requests had been passed to it. You can also see that each request has a median runtime of 26 milliseconds, and the API can effectively handle 25 requests per second (RPS). Neat! But are you a visual person? Navigate on over to Charts!

As you can see by these charts here, it’s pretty evident that our API is performing at a very stable rate. Obviously we had a lower RPS before we hit our 100 user cap, but once we topped out at 100 users, everything is pretty much flatlined. (Probably one of the few cases where you actually want to see flatlining!) There is actually a third chart on this page that will graphically show the number of users at different spawn points, too, but I ran out of screenshot room. We won’t cover the other tabs, but you can probably guess at what they do.

And that’s all there is to Locust! If your performance is flatlining with the expected number of users, you’re good to go. If not, you might have to explore different deployment options. Perhaps you might need to scale up more instances of the API, or you might need to explore options on how to further optimize the model itself. At least you can rest easy knowing that you won’t cause a performance bottleneck when you push the final ML-serving API to production.

That’s it for this post, folks! Hope you enjoyed this one. Let me know what other things you’d like me to cover in future posts! Always love to hear your thoughts.

要查看或添加评论，请登录

David Hundley的更多文章

An Extremely Simple Way to Think About Business

2024年8月18日

An Extremely Simple Way to Think About Business

Though I now work professionally in the artificial intelligence space, some folks out there may not know that I began…
Six Ways to Harden Your Model-Serving API with Tests & Scans

2021年8月9日

Six Ways to Harden Your Model-Serving API with Tests & Scans

Hello there friends! I’m back! Apologies for the two month hiatus. I always seem to lose the drive to post new things…
Seven Tips for Crafting a Great Data Science Resume

2021年6月11日

Seven Tips for Crafting a Great Data Science Resume

Hello there friends! If you’ve paid attention to how the data science industry has moved recently, you might notice…
Five Tips for Overcoming Imposter Syndrome in the Data Science World

2021年5月28日

Five Tips for Overcoming Imposter Syndrome in the Data Science World

Hello there, friends! I thought I’d take a small break from the typical tutorial-oriented posts to cover a different…

1 条评论
Terraform + SageMaker Part 2a: Creating a Custom SageMaker Notebook Instance

2021年5月23日

Terraform + SageMaker Part 2a: Creating a Custom SageMaker Notebook Instance

Hello there, all! We are back again with the Terraform + SageMaker series. Now I know, I know… if you’ve been following…
Terraform + SageMaker Part 1b: Initialization with Terraform Cloud

2021年5月15日

Terraform + SageMaker Part 1b: Initialization with Terraform Cloud

Hello there all! We are back again with an admittedly unexpected post with a bit of a funny backstory. As you might…

1 条评论
Data Science Quick Tips #012: Creating a Machine Learning Inference API with FastAPI

2021年4月25日

Data Science Quick Tips #012: Creating a Machine Learning Inference API with FastAPI

Hello there friends! We’re back again with another semi-quick post on creating a machine learning inference API with…
Four Skills to Start Your Data Science Learning Path

2021年4月19日

Four Skills to Start Your Data Science Learning Path

At least twice a week, I’m approached by technical and non-technical folks alike asking my thoughts on where to begin…

1 条评论
Terraform + SageMaker Part 1: Terraform Initialization

2021年4月2日

Terraform + SageMaker Part 1: Terraform Initialization

Hello there, folks! Today, we’re starting a new series on using Terraform to create resources on AWS SageMaker. I…

2 条评论
iPad Pro + Raspberry Pi for Data Science Part 4: Installing Kubernetes for Learning Purposes

2021年3月26日

iPad Pro + Raspberry Pi for Data Science Part 4: Installing Kubernetes for Learning Purposes

Hello there friends! We’re back again with a fourth part in our series for enabling a Raspberry Pi to work directly…

See all articles

Performance Testing an ML-Serving API with Locust!

David Hundley

Staff Machine Learning Engineer at State Farm ? | AI/ML Blogger | Livecoding Streamer

David Hundley的更多文章

社区洞察

其他会员也浏览了

The March 2024 MinIO Newsletter

What is Big O Notation? (+ Cheat Sheet)

Introducing Zyte API Enterprise – Technology + Expertise to supercharge your in-house data extraction team

Linear Regression

AIM Weekly #163 - 11-Nov-2024

FeatureStore is the new standard (and the golden egg)

Simplifying key Data Science Concepts! (drafted by Dr Ratika Datta)

?? QueryGPT: The AI Revolution Transforming How We Query Databases

The Role of Web Scraping in Data Science

The Effects of Data Noise on the Efficiency of Vector Search Algorithms

David Hundley的更多文章

An Extremely Simple Way to Think About Business

Six Ways to Harden Your Model-Serving API with Tests & Scans

Seven Tips for Crafting a Great Data Science Resume

Five Tips for Overcoming Imposter Syndrome in the Data Science World

Terraform + SageMaker Part 2a: Creating a Custom SageMaker Notebook Instance

Terraform + SageMaker Part 1b: Initialization with Terraform Cloud

Data Science Quick Tips #012: Creating a Machine Learning Inference API with FastAPI

Four Skills to Start Your Data Science Learning Path

Terraform + SageMaker Part 1: Terraform Initialization

iPad Pro + Raspberry Pi for Data Science Part 4: Installing Kubernetes for Learning Purposes

社区洞察

其他会员也浏览了

The March 2024 MinIO Newsletter

What is Big O Notation? (+ Cheat Sheet)

Introducing Zyte API Enterprise – Technology + Expertise to supercharge your in-house data extraction team

Linear Regression

AIM Weekly #163 - 11-Nov-2024

FeatureStore is the new standard (and the golden egg)

Simplifying key Data Science Concepts! (drafted by Dr Ratika Datta)

?? QueryGPT: The AI Revolution Transforming How We Query Databases

The Role of Web Scraping in Data Science

The Effects of Data Noise on the Efficiency of Vector Search Algorithms