登录查看更多内容

Failproof micro-service: Retry Strategy for intermittent failures

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

发布日期: 2023年2月3日

This post is in continuation to Creating a Failure Resilient Application. I highly recommend reading this article before we continue here.

Microservices are made to handle the high load. But like humans, everything has its limit.

Suppose, We are managing the servers for IRCTC and the Auth Service is at its limit because of the Tatkaal window. Any new user trying to log in is waiting in the request queue and failing with timeout

You hit the login button again. It fails again.

Every Service before throwing Errors retries to best of it's capabilities before it hits the limit shown in the diagram below.

Simple Retry

When you see such situations, it's essential to identify how can we simplify people's life. Retrying infinitely would not only disappoint one user but can break the whole server by exhausting its resources.

We can start by controlling the congestion in request-queue.

Intuition

"Slow down, give it some rest, leave it for some time".

We really do this.

In order to achieve this, we make use of a simple algorithm called Exponential backoff.

Exponential-backoff

Overview

After every retry failure, we increase the wait time for requests by a factor generally kept at 2 seconds.

We do not try infinitely, the max retries control the count as shown below.

领英推荐

Engineering that powers SOS emergency on Uber

Arpit Bhayani 2 年前

The Complete Journey of a Request Before It Reaches…

Arya Pathak 1 个月前

The Superhero Guide to Turbocharged APIs: 5…

John Murillo-Giraldo 8 个月前

Algorithm

Exponential backoff with jitter using a base time of 1 second and an exponent of 2, with the maximum wait time between calls is 30 seconds

The system would exponentially increase the time from 1,2,4,8,16 seconds and would cap to 30 seconds till the threshold is reached

Problems with Exponential Backoff? Adding Jittering

What if requests from multiple users fail at the same time?

It would mean all the user retries would happen in the same instance.

Again we hit the same problem with the only possible solutions.

Drop the failing connection permanently
Modify exponential backoff to add randomness between calls

This thought of adding randomness in the Exopential Backoff sleep time is called Jittering.

After adding jittering out density graph would look something like this

When we compare the two graphs it's obvious that the graph density has been distributed. A side-by-side comparison has been shown below

The diagrams mentioned above compare Exponential Backoff and Jittering for multiple users using the microservice simultaneously. The key observation is to compare how Exponential backoff with Jittering proposes two discrete times for the users to resolve plausible race conditions.

Of all the implementations I have seen the default mechanism provided is almost always Exponential backoff with Jitter.

The Service Principle

847 位关注者

查看更多评论

要查看或添加评论，请登录

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

2024年2月26日

Copy of Thoughts over ? : Tech debt is just bad code?

What's "Thoughts over ?" Thoughts over ? is a segment where I will be discussing "non-technical" problems that software…
Trash Talk and Garbage Collection.

2024年2月5日

Trash Talk and Garbage Collection.

For this newsletter, I have emphasized upon basics of Garbage collection in Python and what life would be like without…
Is More Caching = Efficient Application?

2024年1月29日

Is More Caching = Efficient Application?

For this newsletter, I emphasized upon Caching and how its overdose and inefficient integration can potentially slow…
Using the CAP Theorem to Analyze Microservices

2023年9月18日

Using the CAP Theorem to Analyze Microservices

(Us) Engineers experience multiple learning curves and take multiple ownerships while building software and backend…

1 条评论
Case Study: How Stackoverflow's monolith beats microservice performance.

2023年4月18日

Case Study: How Stackoverflow's monolith beats microservice performance.

Every Software Engineer's savior Stack Overflow operates immaculately, serving around 260,000,000 (260 Million)…
Designing Microservices for failure Resiliency

2023年1月14日

Designing Microservices for failure Resiliency

In Microservices, we achieve "Segregation of Concerns" which prevents the whole system from crashing when a particular…
Tech in trend : Serverless!

2022年11月23日

Tech in trend : Serverless!

As per a survey by Oreilly, almost 40% of the companies leveraging Software services have moved to serverless…
Being proactive with reactive scaling with KEDA

2022年10月30日

Being proactive with reactive scaling with KEDA

Intro https://naruto.fandom.
Breaking the if-else logic trap with the Rule-based design pattern

2022年10月16日

Breaking the if-else logic trap with the Rule-based design pattern

Overview There are situations where you are presented to deal with legacy code or work upon modules that require you to…

13 条评论
Scaling up or Scaling out?

2022年9月7日

Scaling up or Scaling out?

Overview You are ready with your Stateless Application server and are inviting users to test it out. As soon as the…

See all articles

Failproof micro-service: Retry Strategy for intermittent failures

Navjot Bansal

Building Computer Vision Systems @Oracle | Software Architecture | System Design | ICPC Regionalist

Intuition

Exponential-backoff

Overview

领英推荐

Algorithm

Problems with Exponential Backoff? Adding Jittering

The Service Principle

847 位关注者

Navjot Bansal的更多文章

社区洞察

其他会员也浏览了

eCHO News 65

Oasis Network 2024 Roadmap

Kubernetes Service Discovery

Comparing VPA and HPA Performance

What are Kubernetes Node Affinity and Pod Affinity?

Understanding Istio: The Service Mesh for Modern Cloud-Native Applications

Kubernetes - Your First Steps into Power and Scale

ClusterIP Service in Kubernetes

Istio Retries, Attempts, and preTryTimeout

Intuition

Exponential-backoff

Overview

领英推荐

Algorithm

Problems with Exponential Backoff? Adding Jittering

The Service Principle

847 位关注者

Navjot Bansal的更多文章

Copy of Thoughts over ? : Tech debt is just bad code?

Trash Talk and Garbage Collection.

Is More Caching = Efficient Application?

Using the CAP Theorem to Analyze Microservices

Case Study: How Stackoverflow's monolith beats microservice performance.

Designing Microservices for failure Resiliency

Tech in trend : Serverless!

Being proactive with reactive scaling with KEDA

Breaking the if-else logic trap with the Rule-based design pattern

Scaling up or Scaling out?

社区洞察

其他会员也浏览了

eCHO News 65

Oasis Network 2024 Roadmap

Kubernetes Service Discovery

Comparing VPA and HPA Performance

What are Kubernetes Node Affinity and Pod Affinity?

Understanding Istio: The Service Mesh for Modern Cloud-Native Applications

Kubernetes - Your First Steps into Power and Scale

ClusterIP Service in Kubernetes

Istio Retries, Attempts, and preTryTimeout