Performance Testing an ML-Serving API with Locust!

Performance Testing an ML-Serving API with Locust!

Hello again, friends! Welcome back to another data science quick tip. Now, when it comes to the full spectrum of data science (discovery to production), this post definitely falls toward the end of the spectrum. In fact, some companies might recognize this as the job of a machine learning engineer rather than a data scientist. As a machine learning engineer myself, I can verify that’s definitely true for my situation.

Still, I‘m sure there are many data scientists out there who are responsible for deployment of their own machine learning models, and this post will hopefully shed some light on how to do easy performance testing with this neat tool called Locust.

Before we jump in to performance testing, let’s address the API itself for a quick second. Using a dummy model we created in one of our earlier posts, we’ll use Flask and Gunicorn to serve the model behind an API endpoint. A user will POST appropriate JSON data to the endpoint and receive back the expected prediction from the machine learning model. Now, this post isn’t about creating an ML-serving API, so I quickly created one for our purposes. In order to use it, all you need to do is download my code from GitHub, navigate to the “api” folder, and run the following command in your terminal:

bash run.sh

What this will do is spin up your API behind a Gunicorn server with five workers at localhost:5001. If you see this screen, you’re on the right track.

No alt text provided for this image

Keep that tab open in your terminal and open a new one. Just to verify that the API is actually working, I created a separate little test that will quickly run 2 observations through the API. In the same code repository, navigate on over to the “test_data” directory and run the following command:

  bash tests.sh


If the API is working as it should, you should see the following screen:

No alt text provided for this image

Alright, so we’re ready to move onto the meat of this post: performance testing! When a machine learning model is used in this API context in a production setting, it’s super important to make sure it can handle the proper load of requests. If you get too many users or too many requests, you could have some major problems. You don’t want to be the one to bring down production!

Fortunately, folks have made this nice tool called Locust (or Locust.io) in order to help with just this issue. At first, the code can look odd, but we’ll explain things simply here so you can get up and running in no time.

First things first, you probably need to install Locust on your machine for the first time. Simple enough to do! Just run the following pip command to download Locust from PyPi:

pip install locust


Alrighty, now we’re ready to build our Locustfile! The Locustfile is a simple Python script that we will invoke to fire up Locust and it’s super handy user interface. By default, the Locust command line tool will look for a file called “locustfile.py”, but you can truly name it whatever you want (as long as you specify it with the -f flag). Keeping things easy on ourselves, we’ll simply call ours that default locustfile.py. And here’s everything we’re going to put in it.

from locust import HttpUser, task, between
import json
# Loading the test JSON data
with open('test_data/test_1.json') as f:
    test_data = json.loads(f.read())
# Creating an API User class inheriting from Locust's HttpUser class
class APIUser(HttpUser):
    # Setting the host name and wait_time
    host = 'https://localhost:5001'
    wait_time = between(3, 5)
    # Defining the post task using the JSON test data
    @task()
    def predict_endpoint(self):
        self.client.post('/predict', json = test_data)


It’s a pretty small script, but it’ll do some powerful things for us! Now, the first time you see this syntax, it can be a little odd, so let’s break it down bit by bit so you understand what’s going on here. Starting off with this first bit…

from locust import HttpUser, task, between
import json
# Loading the test JSON data
with open('test_data/test_1.json') as f:
    test_data = json.loads(f.read())


We’re simply importing what we’ll need from Locust and JSON and loading in the test JSON data that I have already provided. So far, this is probably nothing you’re unfamiliar with. But here’s where things start to get a little tricky. We’ll go slow here.

# Creating an API User class inheriting from Locust's HttpUser class
class APIUser(HttpUser):


Alright, so you’re probably familiar with Python classes. This is creating a new class that inherits the stuff from the parent “HttpUser” class as created by Locust. I’m not going to go deeply into the attributes / methods of that class, but suffice to say, this is what Locust is going to use when we spin up the user interface here shortly.

Moving along…

    # Setting the host name and wait_time
    host = 'https://localhost:5001'
    wait_time = between(3, 5)


The host is probably pretty straightforward here: we’re simply providing the base URL that the API is currently being served on. (Recall that I have my Gunicorn script serving at localhost:5001.) The “wait_time” piece is probably new to you. In tandem with that “between()” method, this is noting how long Locust should wait before spawning additional users. “between()” goes by seconds, so in our example here, new users will spawn some time every 3 to 5 seconds.

And the last part of our script:

    # Defining the post task using the JSON test data
    @task()
    def predict_endpoint(self):
        self.client.post('/predict', json = test_data)


That “@task()” decorator is telling our APIUser class what action it needs to take when Locust fires up. You can actually have multiple tasks and even weight them appropriately, but that goes beyond our scope here. For our purposes, one task will do it. All we need our task to do is to call the “/predict” API endpoint and pass it the JSON test data we loaded in at the top of the script.

Now comes the fun part! With our API still running, open a new tab in your terminal. Navigate on over to the directory with locustfile.py in it and run the following command:

locust


Remember, Locust by default is looking for that locustfile.py file, so that’s why we don’t need to necessarily specify anything else in the command line. What you should see is something that looks like this.

No alt text provided for this image

What this is is noting is that Locust has started up a web user interface at a specific port number. In my case, you’ll notice the Locust UI is being served behind localhost:8089. Open up your browser of choice and navigate on over to there. You should be greeted with the following screen.

No alt text provided for this image

Actually, the number of users and spawn rate will be empty. In this example, what I’m specifying here is that I want to test a total of 100 users. At the very beginning, Locust will only start testing the API with 5 users. Then after every 3–5 seconds (which we specified as the wait_time in our script), Locust will add another 5 users until it hits the total of 100 users. Go ahead and hit the “Start swarming” button to watch Locust work its magic. This is the first screen you’ll be greeted with.

No alt text provided for this image

You can see at the point that when I took this screenshot, Locust had already capped out at the 100 users and that 1866 requests had been passed to it. You can also see that each request has a median runtime of 26 milliseconds, and the API can effectively handle 25 requests per second (RPS). Neat! But are you a visual person? Navigate on over to Charts!

No alt text provided for this image

As you can see by these charts here, it’s pretty evident that our API is performing at a very stable rate. Obviously we had a lower RPS before we hit our 100 user cap, but once we topped out at 100 users, everything is pretty much flatlined. (Probably one of the few cases where you actually want to see flatlining!) There is actually a third chart on this page that will graphically show the number of users at different spawn points, too, but I ran out of screenshot room. We won’t cover the other tabs, but you can probably guess at what they do. 

And that’s all there is to Locust! If your performance is flatlining with the expected number of users, you’re good to go. If not, you might have to explore different deployment options. Perhaps you might need to scale up more instances of the API, or you might need to explore options on how to further optimize the model itself. At least you can rest easy knowing that you won’t cause a performance bottleneck when you push the final ML-serving API to production.

That’s it for this post, folks! Hope you enjoyed this one. Let me know what other things you’d like me to cover in future posts! Always love to hear your thoughts.

要查看或添加评论,请登录

David Hundley的更多文章

社区洞察

其他会员也浏览了