登录查看更多内容

System Design: Rate Limiter

Omar Ismail

Senior Software Engineer @ Digitinary | Java & Spring Expert ? | AWS & Microservices Architect ? | FinTech & Open Banking Innovator ?? | Digital Payments Expert ?? | Top 200 IT Content Creator in Jordan ?? | 40K+ ??

发布日期: 2022年1月31日

+ 关注

Thanks to original creator: https://medium.com/geekculture/system-design-basics-rate-limiter-351c09a57d14

What is Rate Limiter?

Rate limiting?refers to preventing the frequency of an operation from exceeding a defined limit. In large-scale systems, rate limiting is commonly used to protect underlying services and resources. Rate limiting is generally used as a defensive mechanism in distributed systems, so that shared resources can maintain availability.

Rate limiting protects your APIs from unintended or malicious overuse by limiting the number of requests that can reach your API in a given period of time. Without rate limiting any user can bombard your server with requests leading to spikes that starves other users.

Why Rate limiting?

Preventing Resource Starvation: The most common reason for rate limiting is to improve the availability of API-based services by avoiding resource starvation. Load based denial of service (doS) attacks can be prevented if rate limiting is applied. Other users are not starved even when one user bombards the API with loads of requests.
Security: Rate limiting prevents brute forcing of security intensive functionalities like login, promo code etc. Number of requests to these features is limited on a user level so brute force algorithms don’t work in these scenarios.
Preventing Operational Costs: In case of auto-scaling resources on a pay per use model, Rate Limiting helps in controlling operational costs by putting a virtual cap on scaling of resources. Resources might scale out of proportion leading to exponential bills if rate limiting is not employed.

Rate Limiting Strategies

Rate limiting can be applied on the following parameters:

User: A limit is applied on the number of requests allowed for a user in a given period of time. User based rate limiting is one of the most common & intuitive forms of rate limiting.

2. Concurrency: Here the limit is employed on the number of parallel sessions that can be allowed for a user in a given timeframe. A limit on the number of parallel connections helps mitigate DDOS attacks as well.

3. Location/ID: This helps in running location based or demography centric campaigns. Requests not from the target demography can be rate limited so as to increase availability in the target regions

4. Server: Server based rate limiting is a niche strategy. This is employed generally when specific servers need most of the requests, i.e. servers are strongly coupled to specific functions

Rate Limiting Algorithms

Leaky Bucket: Leaky Bucket is a simple intuitive algorithm. It creates a queue with a finite capacity. All requests in a given time frame beyond the capacity of the queue are spilled off.
The advantage of this algorithm is that it smoothens out bursts of requests and processes them at a constant rate. It’s also easy to implement on a load balancer and is memory efficient for each user. A constant near uniform flow is maintained to the server irrespective of the number of requests.

The downside of this algorithm is that a burst of requests can fill up the bucket leading to starving of new requests. It also provides no guarantee that requests get completed in a given amount of time.

2.?Token Bucket: Token Bucket is similar to leaky bucket. Here we assign tokens on a user level. For a given time duration?d, the number of request?r?packets that a user can receive is defined. Every time a new request arrives at a server, there are two operations that happen:

Fetch token:?The current number of tokens for that user is fetched. If it is greater than the limit defined then the request is dropped.
Update token: If the fetched token is less than the limit for the time duration d, then the request is accepted and the token is appended.

This algorithm is memory efficient as we are saving less amount of data per user for our application. The problem here is that it can cause race condition in a distributed environment. This happens when there are two requests from two different application servers trying to fetch the token at the same time.

3.?Fixed Window Counter: Fixed window is one of the most basic rate limiting mechanisms. We keep a counter for a given duration of time, and keep incrementing it for every request we get. Once the limit is reached, we drop all further requests till the time duration is reset.

The advantage here is that it ensures that most recent requests are served without being starved by old requests. However, a single burst of traffic right at the edge of the limit might hoard all the available slots for both the current and next time slot. Consumers might bombard the server at the edge in an attempt to maximise number of requests served.

领英推荐

Core Security Principles In API Architecture

Vintage Global 6 个月前

AUSF (Authentication Server Function) In 5G-NR

TechLTE World 1 年前

Real-World Example: OS Command Injection Attack

Towfik Alrazihi 7 个月前

4.?Sliding Log?: Sliding log algorithm involves maintaining a time stamped log of requests at the user level. The system keeps these requests time sorted in a Set or a Table. It discards all requests with timestamps beyond a threshold. Every minute we look out for older requests and filter them out. Then we calculate the sum of logs to determine the request rate. If the request would exceed the threshold rate, then it is held, else it is served.

The advantage of this algorithm is that it does not suffer from the boundary conditions of fixed windows. Enforcement of the rate limit will remain precise. Since the system tracks the sliding log for each consumer, you don’t have the stampede effect that challenges fixed windows.

However, it can be costly to store an unlimited number of logs for every request. It’s also expensive to compute because each request requires calculating a summation over the consumer’s prior requests, potentially across a cluster of servers. As a result, it does not scale well to handle large bursts of traffic or denial of service attacks.

5.?Sliding Window: This is similar to the Sliding Log algorithm, but memory efficient. It combines the fixed window algorithm’s low processing cost and the sliding log’s improved boundary conditions.

We keep a list/table of time sorted entries, with each entries being a hybrid and containing the timestamp and the number of requests at that point. We keep a sliding window of our time duration and only service requests in our window for the given rate. If the sum of counters is more than the given rate of the limiter, then we take only the first sum of entries equal to the rate limit.

The?Sliding Window?approach is the best of the lot because it gives the flexibility to scale rate limiting with good performance. The rate windows are an intuitive way to present rate limit data to API consumers. It also avoids the starvation problem of the leaky bucket and the bursting problems of fixed window implementations

Rate Limiting in Distributed Systems

The above algorithms works very well for single server applications. But the problem becomes very complicated when there is a distributed system involved with multiple nodes or app servers.It becomes more complicated if there are multiple rate limited services distributed across different server regions. The two broad problems that comes across in these situations are?Inconsistency?and?Race Conditions.

Inconsistency

In case of complex systems with multiple app servers distributed across different regions and having their own rate limiters, we need to define a global rate limiter.

A consumer could surpass the global rate limiter individually if it receives a lot of requests in a small time frame. The greater the number of nodes, the more likely the user will exceed the global limit.

There are two ways to solve for these problems:

Sticky Session: Have a sticky session in your load balancers so that each consumer gets sent to exactly one node. The downsides include lack of fault tolerance & scaling problems when nodes get overloaded. You can read more about sticky sessions?here
Centralized Data Store: Use a centralized data store like Redis or Cassandra to handle counts for each window and consumer. The added latency is a problem, but the flexibility provided makes it an elegant solution.

Race Conditions

Race conditions happen in a get-then-set approach with high concurrency. Each request gets the value of counter then tries to increment it. But by the time that write operation is completed, several other requests have read the value of the counter(which is not correct). Thus a very large number of requests are sent than what was intended. This can be mitigated using locks on the read-write operation, thus making it atomic. But this comes at a performance cost as it becomes a bottleneck causing more latency.

Throttling

Throttling is the process of controlling the usage of the APIs by customers during a given period. Throttling can be defined at the application level and/or API level. When a throttle limit is crossed, the server returns HTTP status “429 — Too many requests”.

Types of Throttling:

Hard Throttling: The number of API requests cannot exceed the throttle limit.
Soft Throttling: In this type, we can set the API request limit to exceed a certain percentage. For example, if we have rate-limit of 100 messages a minute and 10% exceed-limit, our rate limiter will allow up to 110 messages per minute.
Elastic or Dynamic Throttling: Under Elastic throttling, the number of requests can go beyond the threshold if the system has some resources available. For example, if a user is allowed only 100 messages a minute, we can let the user send more than 100 messages a minute when there are free resources available in the system.

Anisha Kollipara

SWE@Google| Ex-ServiceNow | Full Stack Developer

2 年

I have a doubt here. In Sliding Log Algorithm, why do we have to store the log of the requests that are already rejected? Considering a rate limiter which processes 2 req/min Req A:00:00:12(accepted) Req B:00:00:20(accepted) Req C:00:00:50 (rejected but entry is present in log) Req D:00:01:40(accepted) Req E: 00:01:45(The time range would be (00:00:45 to 00:01:45) and within this time range we find 3 requests in the log including 00:01:45) So, we end up rejecting Req E. However, Req C was not even processed and accepting Req E would be within the desired rate limit only (Req D and Req E within this range). So, why don't we just delete Req C as soon as it is rejected?

2 次回应

Shajal Ahamed

MLOps, DevOps, Data Reliability Engineer| Ex-Central Retail Digital | Ex-AirAsia

3 年

Simanta Paul Let's build this in nodejs

3 次回应

Faisal Nazir

Principal Software Engineer | Architect | .Net core | Angular

3 年

Zohaib Saeed

1 次回应

Bestow Tech

Your Trusted Online Shopping Mall at Bestow Technolog Solutions

3 年

Hi, I am a professional graphic designer. Are You Looking Eye-Catching Creative And Premium Quality Design For Stand Out Your Product/Services from a huge Crowd and reach the target audience?? Contact me:? ?? Email: [email protected] Download now:: 1.envato.market/vn0dE3

查看更多评论

要查看或添加评论，请登录

Omar Ismail的更多文章

?? Mastering Spring Boot Performance: The Ultimate Optimization Guide

2025年3月10日

?? Mastering Spring Boot Performance: The Ultimate Optimization Guide

Spring Boot is a powerful framework that simplifies application development, but without proper optimizations…
Mastering Spring Data JPA: Essential Best Practices for Performance, Scalability, and Maintainability

2025年3月10日

Mastering Spring Data JPA: Essential Best Practices for Performance, Scalability, and Maintainability

Spring Data JPA simplifies database interactions, reducing boilerplate code and enabling seamless ORM…

1 条评论
Unlocking Java’s Hidden Secrets: Surprising Facts Even Experts Might Not Know

2025年3月10日

Unlocking Java’s Hidden Secrets: Surprising Facts Even Experts Might Not Know

Java has stood the test of time, evolving into one of the most robust and versatile programming languages. While many…
Monolithic vs. Microservice Architecture in Spring Boot

2025年2月28日

Monolithic vs. Microservice Architecture in Spring Boot

Introduction Software architecture plays a critical role in the scalability, maintainability, and performance of an…
The Ultimate Guide to Code Review Best Practices

2025年2月27日

The Ultimate Guide to Code Review Best Practices

Introduction Code reviews are a fundamental part of modern software development. A well-executed code review process…
Mastering Advanced Spring Boot 3 in 2025: Best Practices & Future Trends

2025年2月27日

Mastering Advanced Spring Boot 3 in 2025: Best Practices & Future Trends

Introduction Spring Boot has been the go-to framework for Java developers aiming to build scalable, production-ready…
Best Practices for Java Collections and Generics: Optimizing Code for Performance, Memory, and Type Safety

2025年2月25日

Best Practices for Java Collections and Generics: Optimizing Code for Performance, Memory, and Type Safety

Java Collections Framework and Generics are foundational to writing efficient, type-safe, and maintainable…

1 条评论
?? Database Versioning with Liquibase in Spring Boot: Best Practices & Naming Conventions ??

2025年2月20日

?? Database Versioning with Liquibase in Spring Boot: Best Practices & Naming Conventions ??

Managing database changes efficiently is crucial for maintaining stability and consistency in any application…

2 条评论
Understanding PUT vs. PATCH in REST APIs with Spring Boot

2025年2月19日

Understanding PUT vs. PATCH in REST APIs with Spring Boot

Introduction In RESTful APIs, both PUT and PATCH are used to update resources, but they serve different purposes…
Real-Time Data Streaming in Spring Boot: A Deep Dive into Server-Sent Events (SSE) and Long-Polling

2025年2月19日

Real-Time Data Streaming in Spring Boot: A Deep Dive into Server-Sent Events (SSE) and Long-Polling

Introduction Real-time data streaming is essential for interactive applications that require live updates, such as…

2 条评论

See all articles

System Design: Rate Limiter

Omar Ismail

Senior Software Engineer @ Digitinary | Java & Spring Expert ? | AWS & Microservices Architect ? | FinTech & Open Banking Innovator ?? | Digital Payments Expert ?? | Top 200 IT Content Creator in Jordan ?? | 40K+ ??

领英推荐

Omar Ismail的更多文章

社区洞察

其他会员也浏览了

March 12, 2024

The Ultimate Guide to Penetration Testing: Web, Mobile, API, Web 3.0, Cloud, Red Teaming, and Infrastructure Security

Critical Kubernetes Image Builder Vulnerability Exposes Nodes to Root Access Risk

Integrating Bulk SMS with SS7 Protocol: A Technical Odyssey

WDigest - The Resurrection of an old password stealing technique ??

Secure Socket Layer (SSL/TLS) with Rust: A Primer

WebAssembly: A "C Change" for Embedded Device Security

SecureString in .NET: Balancing Security and Performance

Zero Trust Architecture (ZTA) Overview

Shielding APIs: Practical Tips and AI-Powered Solutions

领英推荐

Omar Ismail的更多文章

?? Mastering Spring Boot Performance: The Ultimate Optimization Guide

Mastering Spring Data JPA: Essential Best Practices for Performance, Scalability, and Maintainability

Unlocking Java’s Hidden Secrets: Surprising Facts Even Experts Might Not Know

Monolithic vs. Microservice Architecture in Spring Boot

The Ultimate Guide to Code Review Best Practices

Mastering Advanced Spring Boot 3 in 2025: Best Practices & Future Trends

Best Practices for Java Collections and Generics: Optimizing Code for Performance, Memory, and Type Safety

?? Database Versioning with Liquibase in Spring Boot: Best Practices & Naming Conventions ??

Understanding PUT vs. PATCH in REST APIs with Spring Boot

Real-Time Data Streaming in Spring Boot: A Deep Dive into Server-Sent Events (SSE) and Long-Polling

社区洞察

其他会员也浏览了

March 12, 2024

The Ultimate Guide to Penetration Testing: Web, Mobile, API, Web 3.0, Cloud, Red Teaming, and Infrastructure Security

Critical Kubernetes Image Builder Vulnerability Exposes Nodes to Root Access Risk

Integrating Bulk SMS with SS7 Protocol: A Technical Odyssey

WDigest - The Resurrection of an old password stealing technique ??

Secure Socket Layer (SSL/TLS) with Rust: A Primer

WebAssembly: A "C Change" for Embedded Device Security

SecureString in .NET: Balancing Security and Performance

Zero Trust Architecture (ZTA) Overview

Shielding APIs: Practical Tips and AI-Powered Solutions