What are Load Balancers ???

What are Load Balancers ???

Over the years, load balancers have become an essential pillar of system architecture, serving as the front-line controllers that intelligently distribute incoming network traffic across multiple servers.

Similar to an air traffic controller coordinating the safe landing of planes by assigning them to available runways, load balancers regulate data flow within a system, ensuring horizontal scalability and optimal resource utilization.

Before we dive deeper, here's a quick cheat sheet summarizing today’s article ??

Figure 1: Load Balancers cheat sheet!


Let’s understand each component in detail:

Functions of a load balancer

  1. Distribute Traffic: Load balancers evenly distribute incoming requests across multiple servers to prevent overloading any single server.
  2. Fault Tolerance: They detect server failures and redirect traffic to healthy servers, ensuring continuous service availability.
  3. Rate Limiting: Load balancers can control traffic flow by limiting the number of requests sent to servers, preventing overload.
  4. Session Persistence: They maintain session data by consistently routing a user’s requests to the same server for the duration of a session.


Types of Load Balancers

  1. Layer 4: Operates at the transport layer, distributing traffic based on IP addresses and TCP/UDP ports. Eg: HAProxy, AWS ELB
  2. Layer 7: Works at the application layer, routing traffic based on content such as HTTP headers and URLs. Eg: Nginx, Traefik, AWS ALB
  3. Cloud-Based: Load balancers hosted in the cloud, offering scalable, on-demand traffic management without hardware overhead. Eg: Cloudflare, AWS Route 53
  4. Hardware: Physical load balancers deployed on-premises, providing high-performance traffic distribution with dedicated infrastructure. Eg: F5, A10, Citrix ADC


Types of load balancing algorithms

1. Round Robin

How does it work?

Requests are distributed sequentially across servers, looping back to the first server after reaching the last one.

In Figure 2, requests 1, 2, 3 and 4 go to servers 1, 2, 3 and 1 respectively.

Figure 2: Round Robin

Pros & Cons

? Simple to implement and effective for servers with equal capacity.

? Doesn't account for server load or processing power.

Example: Default algorithm used by Kubernetes?Load Balancer for routing traffic to service pods.

2. Sticky Round Robin

How does it work?

The first client request goes to a random server, and all future requests stick to that server unless it's unavailable.

In Figure 3, requests 1 & 2 go to server 1 while requests 3 & 4 go to server 2.

Figure 3: Sticky Round Robin

Pros & Cons

? Ensures session persistence, ideal for applications requiring user state retention.

? Not suitable for applications with short-lived sessions.

? If a server goes down, session data can be lost unless state is stored centrally.

Example: Redis Cluster uses this to direct a client’s requests to the same Redis node, ensuring consistent access to cached data.

3. Weighted Round Robin

How does it work?

Servers are assigned weights based on their capacity, with higher-weight servers handling more requests.

In Figure 4, Out of 4 requests, 3 requests (0.8 4 = 3.2) go to server 1 and 1 request (0.1 4 = 0.4) goes to server 2.

Figure 4: Weighted Round Robin

Pros & Cons

? Balances load based on server capacity.

? More efficient for systems with servers of varying strength.

? Doesn’t factor in real-time load (active connections).

Example: AWS Elastic Load Balancer uses this to direct more traffic to EC2 instances with higher capacity and computing power.

4. IP Hash

How does it work?

Calculates a hash value from the client’s IP address and uses it to determine the server to route the request.

In Figure 5, requests 1 & 2 are from same IP in China, hence go to server 1 where hash(IP)=0, while requests 3 & 4 from same IP in India go to server 3 where hash(IP)=2.

Figure 5: IP Hash

Pros & Cons

? Ensures consistent routing of requests from the same client to the same server.

? Good for session persistence without needing cookies.

? Can lead to uneven load distribution if many clients share similar IP addresses.

? IP changes can break persistence.

Example: E-commerce platforms like Amazon use this to ensure that?the same server handles all cart requests for a user, such as adding, viewing, or removing items from the cart.

5. Least connections

How does it work?

Incoming requests are directed to the server with the fewest active connections.

In Figure 6, all requests go to server 2 because it has the least no of client connections (50).

Figure 6: Least Connections

Pros & Cons

? Balances real-time traffic load efficiently.

? Prevents servers from becoming overloaded.

? Doesn’t account for server capacity or request complexity.

Example: PgBouncer distributes queries across PostgreSQL instances based on least connections, preventing overload on any single instance.

6. Least Response Time

How does it work?

Incoming requests are routed to the server with the fastest response time.

In Figure 7, all requests go to server 1 because it has the least current response time (1ms).

Figure 7: Least Response Time

Pros & Cons

? Reduces latency for time-sensitive applications.

? May lead to instability if servers experience temporary performance spikes.

Example: Booking platforms (eg: Expedia)?use this at the API/ingress gateway level to route travel booking requests to the fastest available server.

Thanks for reading! If you liked the article, please subscribe to my System Design Newsletter to keep on receiving more such breakdowns and diagrams.

Aniket Singh

Founding Engineer at barq | Building innovative fintech solutions for global markets

6 个月

If you liked the deep dive, subscribe to my weekly system design newsletter - https://systemdesignnewsletter.substack.com/ ??

回复
Aman Manapure

[Immediate joiner] Cloud/DevOps/SRE Roles | MCT | LiFT Cloud Captain | SUSE Scholar '21 | OSS-ELC '20 Scholar | Former Fedora Contributor | ?? DevSecOps | Tech ???? | Community ?? | Public Speaker ???

6 个月

Well done!

要查看或添加评论,请登录

Aniket Singh的更多文章

  • What are rate limiters - Part 2

    What are rate limiters - Part 2

    In the previous post, we covered two key aspects of rate limiters: core concept and optimal placement. In this post…

  • What are rate limiters - Part 1

    What are rate limiters - Part 1

    Rate limiting is like a traffic cop for the internet, controlling how often users can perform actions. It prevents…

  • How does Airbnb avoid double payments?????

    How does Airbnb avoid double payments?????

    Ensuring eventual consistency in Airbnb's payment system was crucial as they transitioned to a Service Oriented…

  • Avoiding double payments ??

    Avoiding double payments ??

    Designing large-scale systems involves tackling complex challenges like double execution of critical operations…

    2 条评论

社区洞察

其他会员也浏览了