Building highly available (HA) and resilient microservices using Istio Service Mesh

Building highly available (HA) and resilient microservices using Istio Service Mesh

What is High Availability in microservices

High availability systems are designed to provide continuous and uninterrupted service to the end customer by using redundant software performing similar functions. In highly available microservices, all the hosts must point to the same storage. So, in case of failure of one host, the workload in one host can failover to another host without downtime. The redundant software can be installed in another virtual machine (VM), or Kubernetes clusters in multicloud or hybrid cloud.

In this blog we will talk about how Tetrate helps platform architects to configure Istio service mesh for enabling automatic failover, achieving high availability.

Why do IT organizations need high availability in microservices?

Organizations today follow service-oriented architecture approaches, using microservices architectures to build distributed systems that span multiple workloads or multiple clouds. One of the main challenges of microservices is the way services communicate over the network using the API.

Communication between these services in a distributed system can fail due to many reasons, which include:

  1. Unreliable network
  2. High latency / slow speed
  3. Limited bandwidth
  4. Insecure network
  5. Changing topology
  6. Internal and external security threats
  7. High transport costs
  8. Heterogeneous network

Read the?fallacies of distributed computing?to see a list of assumptions an architect must consider to mitigate service outages in distributed systems.

For the above reasons, public clouds have failed multiple times. Public clouds like AWS, Azure, and Google Cloud provide a service level agreement (SLA) commitment of 99.99% uptime – that is, just under one hour of downtime a year. downtime with just 52.6 minutes/year).

When they fail to meet SLAs, cloud providers offer?service credits, but this does not prevent their customers, and consumers, from getting frustrated when their transactions cannot be completed, or when they’re unable to access applications, leading to a loss of business.

AWS, the leading public cloud player, recently experienced multiple cloud failures. Users were locked out of messaging platforms, gaming applications, and social media sites. The?AWS east region outage in 2021 brought down Disney+, Netflix, and many other?services.?Misconfigurations to routers brought down Whatsapp, Instagram, and Facebook globally?in 2021. There have been many?other outages.

From an infrastructure perspective, a platform architect or enterprise architect should engineer a highly reliable and available system with redundant microservices using service mesh, which is a communication and security services layer in your microservice setup. The idea is that each service will have a proxy service ( often implemented using Envoy as the proxy software), and all the traffic requests and replies to and from each service will go through the Envoy proxy.

If you are new to Envoy Proxy,?learn what Envoy is in 5 minutes. Or, if you are interested, you can learn?why Envoy-based service mesh is an integral part of cloud native applications.

In the image below (Figure A), the Envoy proxy is used both as the load balancer and the sidecar proxy service to two services ( Application A and B). In case of failure, the service mesh can be configured to automatically redirect requests to the redundant instance of the microservice.


No alt text provided for this image
HA microservices with Envoy proxy

Four steps to achieve fault-tolerant and highly available microservices using a service mesh

If you are designing a microservices application, then high availability can be achieved in 4 logical steps:

  1. Create redundant hosts in multi-region or multi-cloud using a service mesh
  2. Ensure constant monitoring of traffic in the service mesh, with detection of site failures
  3. Plan for automatic multisite failover of the application
  4. Restart/Debug the lost host with minimal effort


Continue reading on Tetrate Blog


要查看或添加评论,请登录

Tetrate的更多文章

社区洞察

其他会员也浏览了