Rate-Limiting Simplified With Redis
Introduction
For engineers, witnessing the application they’ve worked on scale from serving hundreds to hundreds of thousands (or even millions) of users is a truly significant achievement. However, this kind of growth brings with it another challenge: the need to ensure that the system, with its finite resources, can effectively handle the increased load while maintaining optimal performance and reliability. It therefore becomes essential to regulate the flow of incoming requests to ensure that the system doesn’t throw its hands up when faced with sudden spikes in traffic. This is where rate limiting comes into play.
Simply put, rate limiting is a technique used to control the rate of incoming requests to a server. It restricts the number of requests a user can make within a specific period. This ultimately reduces the chances of server overload and protects it against abuse (such as bot attacks).
Redis has proven to be an excellent choice for custom rate-limiting functionality. Before we dive into Redis, however, let’s take a look at some common rate-limiting algorithms to better understand it.
Types of Rate Limiting Algorithms
Fixed Window
This is arguably the most primitive algorithm that can be implemented. Requests are tracked within discrete time windows or intervals (such as one second or minute). If the number of requests exceeds the predefined limit within the window, all subsequent requests are rejected until the window resets
Although this algorithm is simple to implement and predictable in behavior, it does not handle bursts of traffic efficiently. For example, in the diagram above (which shows a limit of four requests per minute), I could make four requests at the same time at the end of the first window, and four requests again (at the same time) at the beginning of the first window. Although I haven’t exceeded the rate limit, I have sent more than four requests within a very short period — any potentially overloaded the server.
In practice, very few use cases can utilize these kinds of discrete windows (and such strict rate-limiting rules altogether). Ideally, we want something more adaptable to incoming requests.
Additionally, the behavior of this algorithm at the window boundaries can be quite abrupt, which can result in inconsistent rate limiting.
Sliding Window
This algorithm can be thought of as a refinement of the fixed window algorithm. Instead of discrete windows of fixed intervals, we now use a window that slides or moves as time progresses. New requests are accepted or rejected based on the number of previous requests that lie within the sliding window.
The sliding window algorithm provides a lot more granular control, and the abruptness that we may see with the fixed window algorithm is also addressed. Although this algorithm is better at handling bursts of traffic, it may still allow for bursty behavior depending on the configuration. Not to mention the additional complexity and overhead that is introduced since we need to track requests continuously.
Token Bucket
The token bucket algorithm uses a theoretical “bucket” which is refilled with “tokens” on a fixed interval. This interval typically corresponds to the rate limit itself. For example, if the rate limit is four requests per minute, then tokens will be refilled at a rate of four tokens per minute too. Once the “capacity” of the bucket is exceeded, any additional tokens “overflow” and are discarded.
Each request uses or consumes one token. If no tokens are available in the bucket, then the request is either rejected or throttled until tokens become available.
The capacity of the bucket itself represents the burst limit. For example, if the bucket capacity is ten, then a user can “accumulate” ten tokens (during periods of low activity) and use them in bursts (during periods of high activity).
This algorithm excels at handling bursts of traffic effectively (and it provides a lot of granular control when doing so) and is quite predictable in its behavior. This makes it a good choice for use cases where bursty behavior can be tolerated, such as video streaming.
However, implementing it correctly and efficiently can be quite challenging — and failure to do so can often result in unreasonable request delays or rejections. Additionally, the token refill logic is tricky to optimize (and may have a lot of overhead), which means that preventing token wastage (which would negatively affect efficiency) is another problem that has to be solved.
Leaky Bucket
The leaky bucket uses the “bucket” metaphor once again, but it flips things around a bit. Each request is now considered a “token” which fills up the bucket. The catch here is that tokens “leak” (i.e., are processed) from the bucket at a constant rate. This helps smooth out incoming requests and essentially blocks bursty requests — incoming requests can be bursty, but the responses are always smoothed out. This makes leaky bucket rate limiting good for use cases where we need a continuous flow of data. Plus, it’s relatively simple to implement too (compared to something like the token bucket algorithm).
The drawbacks of using this algorithm include the fact that it does not handle bursts of traffic well. Additionally, if the bucket’s capacity or too small or the “leak rate” is too low, then we may run into a case of request starvation. Generally speaking, the leaky bucket algorithm lacks the flexibility to adapt to changing traffic patterns.
Why is Redis a Good Solution?
One word: in-memory.
For those unacquainted, Redis (which stands for Remote Dictionary Server) is an in-memory data structure store. In-memory simply means that data is stored in the system’s main memory (i.e., the RAM) rather than on secondary storage devices (which is what your regular databases use).
Redis is a good solution for rate limiting because:
领英推荐
Getting Our Hands Dirty
Let’s transport ourselves to JavaScript land and and try to build a simple rate limiter with Redis, using the token bucket algorithm. To start, let’s create a simple API route using Node.js and Express that fetches some dummy data.
Next, let’s create a simple script that simulates GET requests to the /api/data route. We’ll add a delay between requests and error handling as well.
If we run this script, the result (as expected) is predictable. Each request is simply processed one after the other.
Let’s now introduce Redis into the picture. A couple of prerequisites first, however:
Let’s now create the Redis client and connect to the redis-server that is running locally.
Next, let’s add some middleware that allows or restricts access to our api/data API route based on our rate-limiting parameters. We’re also going to define these rate-limiting parameters within this middleware. To be specific, we need to:
With our rate limiting parameters defined, let’s retrieve our bucket from Redis using the unique bucketKey.
?? Since Redis follows the key-value data model, our bucket is simply going to be a (stringified) JSON object against the bucketKey.
If the bucket does not exist, we’re simply going to create one and add initialNumTokens number of tokens to it. We’ll also maintain a property (lastUpdated) that lets us determine when the bucket was last updated.
If the bucket already exists, we need to apply our token refill logic first. To calculate the number of tokens that need to be added to the bucket, we need to:
We’re also going to update the time that the bucket was last updated.
Finally, we need to determine whether the current request needs to be rate-limited. This part is simple: if the bucket is not empty, we’ll process the request (and update the values of the bucket on Redis too). Otherwise, we’ll send an HTTP 429 response back.
Before we can test our rate limiter with our script, we need to make sure that our /api/data API route is configured to use this middleware that we just created.
If we run our script now, we can see our rate limiter in action. Our first ten requests are successful because our bucket initially has ten tokens. As we start consuming these tokens, we’re eventually rate-limited by the time we hit our eleventh or twelfth request (and our requests start getting rejected). Our burst limit is capped at ten requests (and is represented by the bucketCapacity), and we can see that here.
And with that, we can call it a day. Simple, right? ??
Concluding Thoughts
Rate limiting plays a pivotal role in managing the flow of incoming traffic to our applications, thereby preventing server overload and protecting it against abuse. By understanding how different rate-limiting algorithms work and leveraging something like Redis to develop a simple yet effective rate limiter, engineers can ensure that their applications scale without compromising on performance and reliability.
Software Developer
3 个月The best article I could find on this topic. Clear and to the point. Thank you!