Building Resilient APIs with Node.js

Building Resilient APIs with Node.js

In distributed systems and microservices architectures, dealing with intermittent failures is inevitable. Network issues, external service outages, or high latency can directly impact user experience and system reliability. Implementing resilience patterns in APIs with Node.js ensures that failures are handled predictably, keeping the system robust and functional. In this article, we will cover three fundamental patterns for building resilient APIs: Retry, Circuit Breaker, and Bulkhead.


Retry Pattern

The Retry Pattern is a simple technique that attempts to execute an operation multiple times in case of failure, hoping the problem is temporary and will resolve in subsequent attempts. This is useful for calls to external APIs that may intermittently fail.

Node.js Implementation with axios-retry: For illustration, we’ll use the axios library for HTTP requests and axios-retry to implement the Retry pattern.

import axios from 'axios';
import axiosRetry from 'axios-retry';

axiosRetry(axios, { retries: 3, retryDelay: axiosRetry.exponentialDelay });

const fetchDataWithRetry = async () => {
  try {
    const { data } = await axios.get('https://api.example.com/data');
    return data;
  } catch (error) {
    console.error('Failed to fetch data after retries', error.message);
  }
};        

In this example, we configure axios-retry to retry up to three times, applying an exponential delay between attempts. This prevents the system from giving up on the first failure, increasing resilience in cases of intermittent failures.


Circuit Breaker Pattern

The Circuit Breaker Pattern works like an electrical circuit breaker: it monitors failures and, upon reaching a threshold, “opens” the circuit to prevent further calls to an already faulty service. This stops the system from sending requests that are likely to fail, reducing overload and protecting other services.

Node.js Implementation with opossum: opossum is a Circuit Breaker library that simplifies implementing this pattern in Node.js.

import CircuitBreaker from 'opossum';
import axios from 'axios';

const getExternalData = async () => {
  const { data } = await axios.get('https://api.example.com/data');
  return data;
};

const breaker = new CircuitBreaker(getExternalData, {
  timeout: 3000, // Max operation time
  errorThresholdPercentage: 50, // Failure threshold to "open" the circuit
  resetTimeout: 5000 // Time before attempting to "close" the circuit again
});

breaker.fallback(() => 'Service temporarily unavailable');

breaker
  .fire()
  .then(console.log)
  .catch(console.error);        

In this example, the Circuit Breaker will open if 50% of the requests fail or take more than 3 seconds, and will wait 5 seconds before attempting to restore connections. We also define a fallback message to be returned when the service is unavailable.


Bulkhead Pattern

The Bulkhead Pattern isolates different parts of an application to limit the impact of a failure. Imagine you have a critical service and a less critical one: by using this pattern, you can allocate resources separately for each, preventing one service’s failure from overloading the other.

Implementation with Connection Pooling: A simple Bulkhead approach in Node.js is to limit the number of simultaneous connections. We’ll use async to limit the number of parallel requests.

import async from 'async';
import axios from 'axios';

// Limit to 5 simultaneous requests
const queue = async.queue(async ({ url }) => {
  const { data } = await axios.get(url);
  console.log(data);
}, 5);

const fetchMultipleUrls = (urls) => {
  urls.forEach(url => queue.push({ url }));
};

fetchMultipleUrls(['https://api.example1.com', 'https://api.example2.com']);        

In this example, we limit the number of concurrent requests to five, isolating the system from overload and preventing bottlenecks, ensuring that the system can process other operations even if one of the APIs is slow.


Practical Example: Combining Patterns

To demonstrate these resilience patterns in action, let’s build a complete example with two applications:

  1. Service API: This service simulates a problematic API with intermittent errors and delays.
  2. Resilient Client: The main application that makes requests to the unstable service and applies Retry, Circuit Breaker, and Bulkhead patterns to handle the failures.

1. Service API (Simulated Unstable Service)

This API simulates a service that responds inconsistently. Using Express, it returns errors intermittently or responses with random delays.

Dependencies Installation:

mkdir service-api
cd service-api
npm init -y
npm install express        

Service API Code (service-api/index.js):

import express from 'express';

const app = express();
const PORT = 3001;

// Simulate intermittent failures
app.get('/unstable', (req, res) => {
  const randomFail = Math.random() < 0.5; // 50% chance of failure
  const randomDelay = Math.floor(Math.random() * 4000); // Up to 4 seconds delay

  if (randomFail) {
    return res.status(500).json({ error: 'Intermittent server error' });
  } else {
    setTimeout(() => {
      res.json({ message: 'Request succeeded after delay' });
    }, randomDelay);
  }
});

app.listen(PORT, () => {
  console.log(`Service API running at https://localhost:${PORT}`);
});        

This API exposes an /unstable endpoint that:

  • Returns an error (500 status) 50% of the time.
  • In case of success, responds after a random delay of up to 4 seconds.

2. Resilient Client (Main Application)

Now, let’s build the main application that uses Retry, Circuit Breaker, and Bulkhead patterns to make requests to the Service API’s unstable endpoint.

Dependencies Installation:

mkdir resilient-client
cd resilient-client
npm init -y
npm install axios axios-retry opossum async        

Resilient Client Code (resilient-client/index.js):

import axios from 'axios';
import axiosRetry from 'axios-retry';
import CircuitBreaker from 'opossum';
import async from 'async';

// Retry configuration with axios-retry
axiosRetry(axios, { retries: 3, retryDelay: axiosRetry.exponentialDelay });

const fetchDataWithRetry = async (url) => {
  try {
    const { data } = await axios.get(url);
    return data;
  } catch (error) {
    console.error('Failed to fetch data after retries', error.message);
  }
};

// Circuit Breaker configuration
const breaker = new CircuitBreaker(fetchDataWithRetry, {
  timeout: 3000, // Max operation time
  errorThresholdPercentage: 50, // Failure threshold to "open" the circuit
  resetTimeout: 5000 // Time before attempting to "close" the circuit again
});

breaker.fallback(() => 'Service temporarily unavailable');

// Bulkhead configuration using async.queue to limit 5 simultaneous requests
const queue = async.queue(async ({ url }) => {
  try {
    const result = await breaker.fire(url);
    console.log(result);
  } catch (error) {
    console.error('Error fetching data:', error.message);
  }
}, 5);

// Function to fire multiple requests and observe Retry, Circuit Breaker, and Bulkhead in action
const fetchMultipleUrls = (urls) => {
  urls.forEach(url => queue.push({ url }));
};

// Test URLs to simulate calls
fetchMultipleUrls(Array(10).fill('https://localhost:3001/unstable'));        

Code Explanation:

  1. Retry Pattern: We use axios-retry to retry the same request up to three times in case of failure, with an exponential delay between attempts.
  2. Circuit Breaker Pattern: We configure a circuit breaker with opossum that "opens" the circuit after consecutive failures, preventing further attempts and reducing service load. It "closes" automatically after 5 seconds, allowing retries.
  3. Bulkhead Pattern: We use async.queue to limit the number of concurrent requests to five, isolating and controlling the number of active calls.

3. Running the Example

To see these patterns in action, follow these steps:

Start the Service API:

In the service-api folder, run:

node index.js        

Run the Resilient Client:

In another terminal window, in the resilient-client folder, run:

node index.js        

Check the logs in the Resilient Client terminal to observe the following behaviors:

  • Retry: When a request fails (500 error), the application will retry up to three times.
  • Circuit Breaker: After consecutive failures reaching the threshold, the circuit will “open,” stopping further requests for 5 seconds and showing a fallback message.
  • Bulkhead: Limits simultaneous requests to five, ensuring the system remains responsive even under high load.

These logs will show when each pattern is activated and how they work together to handle failures from the unstable API effectively.

Best Practices and Final Considerations

Implementing resilience patterns is a key step toward building robust, high-availability systems. However, these patterns need careful tuning to avoid unintended side effects. Here are some best practices and considerations:

  1. Set Sensible Limits for Retries: Excessive retries can overload the system and make network issues worse. Configure retry attempts and delay intervals carefully to avoid worsening traffic spikes during outages.
  2. Monitor Circuit Breaker States: Logging the status of the circuit breaker (open, half-open, closed) helps detect persistent failures and understand if the current thresholds are appropriate. This can also help identify root causes in case of system-wide issues.
  3. Adjust Bulkhead Limits Based on Resource Availability: Bulkhead limits should align with the resources available (e.g., CPU, memory) to avoid oversubscription and allow critical services to remain available. Simulating stress tests can help determine the optimal concurrency limits for each service.
  4. Fallbacks and Graceful Degradation: Fallback responses, like displaying cached data or simplified content, allow services to keep functioning, even if degraded, in case of failures. These degrade gracefully and maintain a basic level of service.
  5. Test Resilience Patterns Regularly: Simulating network failures, API downtimes, and high traffic loads ensures that resilience patterns are configured effectively. Continuous testing helps identify areas for improvement and prepares the system for real-world issues.

Conclusion

Building resilient APIs is essential in today’s distributed systems, where microservices, third-party APIs, and network issues present unique reliability challenges. The Retry, Circuit Breaker, and Bulkhead patterns offer powerful solutions to mitigate these issues by ensuring that failures are handled gracefully, services remain responsive, and resources are optimally allocated.

In this article, we demonstrated a practical approach to implementing these patterns in Node.js, simulating a faulty service to showcase how each pattern works in real-time. By carefully configuring these patterns, monitoring them, and applying best practices, you can significantly enhance your application’s resilience and ensure a smoother experience for users — even under adverse conditions.

These patterns provide a foundation for building robust APIs, and by adopting them, you can create systems that not only survive failures but continue to deliver reliable service.

Visit the repository here.

Leandro Veiga

Senior Software Engineer | Full Stack Developer | C# | .NET | .NET Core | React | Amazon Web Service (AWS)

4 个月

Great advice

回复
Lucas Wolff

.NET Developer | C# | TDD | Angular | Azure | SQL

4 个月

Very informative Erick Zanetti

回复
Alexandre Pereira

Software Engineer MERN | React.JS | Nodejs | Javascript | Typescript | MongoDB | GCP | Python

4 个月

nice

回复

Useful tips, thanks for sharing

回复
Gustavo Guedes

Senior Flutter Developer | iOS Developer | Mobile Developer | Flutter | Swift | UIKit | SwiftUI

4 个月

Interesting Erick Zanetti! Thanks for sharing.

回复

要查看或添加评论,请登录

Erick Zanetti的更多文章