Canary across multi-cluster with Anthos Service Mesh Part 3/4
Naveen S.R
Heading the AirAsia Move Flights Engineering Team | Engineering Leader | Travel Anchored SuperApp
Authored by?Nayana Madhav,?Rahul Prajapati,?Tushar Bhattacharya?and?Naveen S.R
In our previous article, we explored the reasons behind our strategic shift towards GKE Autopilot.?
Followed by how to efficiently scale and design for spare capacity by defining balloon pods
In this edition we will look at an essential launch strategy to have the ability canary across clusters using ASM.
To configure Anthos Service Mesh on your GKE clusters, ensure the following prerequisites are met:
These prerequisites ensure the successful configuration of Anthos Service Mesh on your GKE clusters.
Set up a multi-cluster mesh on GKE with ASM
The following steps are fully automated with CI
Define required Variables:
Create runtime variables which will be used as weight in virtual service config for the services which are directly accessible through ingress, I just created for one service only
Stage 1: Deploy-Workloads
To efficiently deploy all the services on a secondary GKE cluster before configuring the Multi-Cluster Mesh as part of the on-demand GKE Cluster and ASM setup, Here is demo for services which is part of Zodiac namespace
Stage 2: Rollout Canary
Get GKE cluster context:
Create a Firewall Rule:
To enable cross-cluster traffic in Anthos Service Mesh, you may need to create a firewall rule in the following cases:
By following these instructions and setting up the appropriate firewall rules, you can ensure proper cross-cluster communication in Anthos Service Mesh.
Keeping Traffic in Cluster:
In some cases the default cross-cluster load balancing behavior is not desirable. To keep traffic “cluster-local” calls (i.e. traffic sent from cluster-a will only reach destinations in cluster-a), mark hostnames or wildcards as clusterLocal using MeshConfig.serviceSettings.
Suppose the same services are being deployed on both clusters when first traffic enters in each cluster it should call only services within the same cluster.
For example, you can enforce cluster-local traffic for an individual service, all services in a particular namespace, or globally for all services in the mesh.
NOTE: This will be applicable across both Autopilot and Standard Clusters
Many pages had references to IstioOperator CRD where you create a custom resource and let Istio translate it for you. Unfortunately, in a managed control plane, this operator is hidden / not usable.?
This page covers how to configure optional features on Managed Anthos Service Mesh. It had very few configuration examples and loosely aligned with the Istio reference, though some translation required in the configuration schema
The migration tool is available as part of the asmcli script. You must download the script to use this tool.
领英推荐
? ? ? ? ? configmap-clusterlocal.yaml
After Converting apply in both GKE Cluster
Configure Endpoint Discovery:
Configure remote secrets to allow API server access to the cluster to the other cluster’s Anthos Service Mesh control plane. The commands depend on your Anthos Service Mesh type (either in-cluster or managed),here we are using managed asm
The following complete flow of setting up of Multi-cluster Canary
Stage 3: Split Traffic across both GKE Cluster
The following changes required to have desired trafficd split
Anthos Service Mesh Dashboard
Before the canary : 100% traffic was getting pointed to primary GKE autopilot cluster
Splitting 95% at GKE Autopilot and 5% at GKE Standard?
Splitting 75% at GKE Autopilot and GKE Standard
Following are the some example how downstream services traffic get sticked in same cluster , traffic split equally for the inter-service communications
Stage 4: Rollback Canary Setup
Point 100% Incoming traffic to primary GKE Autopilot Cluster
Cleaning up the Endpoint Discovery
Its critical step when rolling back a canary setup. It ensures that intra-cluster communications between the canary and primary deployments no longer occur. This step is necessary to revert back to the previous stable state and prevent any unintended interactions between the secondary cluster and primary cluster GKE Autopilot. By cleaning up the Endpoint Discovery, you effectively isolate the clusters and restore the normal communication patterns within the cluster.
Delete Firewall Rule
To revert back cross-cluster communication, you need to delete the firewall rule that was created earlier. This firewall rule was responsible for allowing traffic between the clusters. By deleting the rule, you effectively disable cross-cluster communication.
Unregister Secondary GKE Cluster from Fleet
Following these steps will ensure that Anthos Service Mesh is fully removed and the cluster is unregistered from the fleet. Then, you can safely proceed with deleting the secondary GKE Standard cluster
Conclusion?
The fourth and the final part of this edition on the key face palm moments in our journey is captured here