Circuit Breaker Pattern in Elixir
A circuit breaker is used to detect failures and to encapsulate the logic of preventing a failure from constantly recurring during maintenance, temporary external system failure, or unexpected system difficulties.
In the age of microservices, we are more than likely to have services that are calling and dependent on external services outside of our control.
Remote services can hang, fail or become unresponsive. How can we prevent those failures from cascading through the system and from taking up critical resources?
Enter the Circuit breaker pattern. The pattern was popularized in the book Release It by Michael Nygard, and by thought leaders like Martin Fowler.
The idea behind this pattern is very simple: Failures are inevitable, and trying to prevent them altogether is not realistic.
A way to handle these failures is by wrapping these operations into some kind of proxy. This proxy is responsible for monitoring recent failures, and using this information to decide whether to allow the operation to proceed or return an early failure instead.
This proxy is typically implemented as a state machine that mimics the functionality of a physical circuit breaker which my have 3 states:
The Circuit Breaker pattern offers a few key advantages worth noting:
For our example, let’s imagine that we have the following scenario:
We are running a job board aggregator that will consume job postings from Github and other sources. However, since we are consuming a few different APIs we run the risk that we will hit a request limit or that an API will be down.
Let’s start by creating an example API connector to Github Jobs that retrieves the latest 50 jobs posted:
defmodule CircuitBreaker.Api.GithubJobs do
@spec get_positions :: none
def get_positions do
case HTTPoison.get(url()) do
{:ok, response} -> {:ok, parse_fields(response.body)}
{:error, %HTTPoison.Error{id: _, reason: reason}} -> {:error, reason}
All this connector is doing is making a request to retrieving the JSON, parsing it, and returning the list of jobs. If we want to test this we can manually call get_positions in our console:
iex(1)> CircuitBreaker.Api.GithubJobs.get_positions
["Software Engineer", "Backend Engineer (w/m/d)",
"Senior Frontend Engineer (f/m/d)", ...]}
Circuit Breaker Switch
Now that we have ability to make calls to get the job postings, we need to build our circuit breaker to wrap around the API adapter. Let’s take a look at a skeleton for our switch.
defmodule CircuitBreaker.Api.Switch do
use GenStateMachine, callback_mode: :state_functions
@name :circuit_breaker_switch
@error_count_limit 8
@time_to_half_open_delay 8000
def start_link do
GenStateMachine.start_link(__MODULE__, {:closed, %{error_count: 0}}, name: @name)
def get_positions do, :get_positions)
For implementing our circuit breaker we could use the gen_statem behavior directly or in this case leverage the GenStateMachine package which gives us tracking and error reporting, and will work with the supervision tree.
The first two functions we added are:
An important thing to note here is the first line:
use GenStateMachine, callback_mode: :state_functions
In this callback mode, every time you do a call/3 or a cast/2, the message will be handled by the state_name/3 function which is named the same as the current state. In this case our state_name functions will be open, closed, half_open.
Let’s go ahead and start by adding our closed state code:
def closed({:call, from}, :get_positions, data) do
case CircuitBreaker.Api.GithubJobs.get_positions() do
{:ok, positions} ->
{:keep_state, %{error_count: 0}, {:reply, from, {:ok, positions}}}
{:error, reason} ->
handle_error(reason, from, %{ data | error_count: data.error_count + 1 })
All we are doing is calling the API adapter get_positions and, depending on the results, we are either returning the positions list or handling the error.
Let’s go ahead and jump into the terminal and try to get the list of positions through our circuit breaker:
iex(1)> CircuitBreaker.Api.Switch.start_link
{:ok, #PID<0.231.0>}
iex(2)> CircuitBreaker.Api.Switch.get_positions
["Software Engineer", "Backend Engineer (w/m/d)",
"Senior Frontend Engineer (f/m/d)", ...]}
Let’s add the function for the other two states and review how the circuit state change works.
def half_open({:call, from}, :get_positions, data) do
case CircuitBreaker.Api.GithubJobs.get_positions() do
{:ok, positions} ->
{:next_state, :closed, %{count_error: 0}, {:reply, from, {:ok, positions}}}
{:error, reason} ->
open_circuit(from, data, reason, @time_to_half_open_delay)
def open({:call, from}, :get_positions, data) do
{:keep_state, data, {:reply, from, {:error, :circuit_open}}}
def open(:info, :to_half_open, data) do
{:next_state, :half_open, data}
And let’s add a couple of private utility functions:
defp handle_error(reason, from, data = %{error_count: error_count}) when error_count > @error_count_limit do
open_circuit(from, data, reason, @time_to_half_open_delay)
defp handle_error(reason, from, data) do
{:keep_state, data, {:reply, from, {:error, reason}}}
defp open_circuit(from, data, reason, delay) do
Process.send_after(@name, :to_half_open, delay)
{:next_state, :open, data, {:reply, from, {:error, reason}}}
Most of the magic is happening in the open_circuit function where we are doing two things:
After 8000 milliseconds, the circuit breaker, now in the open state, will receive our scheduled message and change the state to half_open.
Finally, during half_open state, we will try to make the calls to the API endpoint, and in case of failure we will switch automatically back to fully open and try again.
Circuit Breakers are a valuable pattern to have in our arsenal, as they can help increase system stability and have a more reliable way of handling errors with remote services.
This example just scratched the surface of what you can do with circuit breakers. There are plenty of opportunities to expand this pattern further, such as:
Finally, as with any pattern, it is important to keep in mind the use case and decided if this kind of behavior is desired.
The full code for this example can be found in circuit_breaker_example
Article original published in:
Software engineering
3 周In practice it seems like one may need to tune how half open state is configured. Do you usually consider the SLA or the external service? What is no such SLA exists?