Circuit Breaker/Throttle Pattern
"We did not expect such a spike from the upstream server"
OR
"Downstream service was very slow to respond which led to exhaustion of all our threads causing this major outage"
How often do you hear the above statements in an RCA. If it is common, then you need to seriously look into circuit breaker/Throttle pattern. CB is a resiliency pattern that assures that upstream/downstream systems wont be able to take your application down.
Circuit Breaker (CB) Pattern
When an application wants defensive measures to protect itself from primarily slow failures of the downstream services it leverages the CB pattern. The implementation is agnostic to the business logic, anytime you make a 2nd party call you annotate your code with the CB library method and the library will take care of managing the resiliency. You will be able to control the various properties like error rate, response time, cool-down period and other params to give the best experience to your upstream services.
To be resilient, fail fast and escalate faster!
Circuit breaker creates a FSM(finite state machine) where the service gets to define when will it change the state.
- Circuit Open : calls are failing, calls not being made to the downstream services
- Circuit Closed: All is good, nothing to worry about.
- Circuit half-open: It had tripped, now testing if we can try again
Throttle Pattern
When the application defends itself against the incoming traffic (vs outbound in CB) it leverages throttling pattern. Throttling is typically used with autoscaling to minimize downtime and remove any human intervention.
Visual representation of Throttle and CB together.
Tools to implement
- Hystrix
- Resilience4j
- Akka
- Istio
- Polly (for .net lovers)
Please share with us your experience and any other library that you tried and how was the experience.
I used https://sketchboard.me/ and https://creately.com/ to create the illustrations.
Engineering Leader, Identity Platform at Microsoft
6 年Nice post! This pattern is a great enabler in the micro-service world. I was introduced to it in my last project when we moved a monolith banking application to an event-driven CQRS system, this was one of the major questions, how do you handle when the vendor is down, or the communication to them is broken etc.?One additional viewpoint around throttling is, the app should be able to defend itself and also defend?its downstream service.
Founding Engineer at Momento | AWS Community Builder
6 年When I was on TTO Platform last year we used hystrix and I loved it. Not only for circuit breaker and throttling but also added benefit of servo which we used to replace Wiley. Huge fan of Netflix java tooling. Was one of the major factors that helped us move to AWS and is still in there today running great. Great write up! Love the diagrams.