Cisco ACI Multi-Pod Part-1 || Overview
ACI Multi-Pod

Cisco ACI Multi-Pod Part-1 || Overview

To be able to understand how ACI Multi-Pod works and how it is provide fault tolerant fabric, we need to understand its own control plane and how it works under the hood.

Cisco ACI Multi-Pod control plane protocols are run individually in each pod as follows:

  • Intermediate system-to-intermediate system (IS-IS): For infra tunnel endpoint (TEP) within a pod.

And in case of IS-IS stopped working in one pod, it doesn't affect the IS-IS running in the other pod because it runs only between spine and leaf switches in each pod.

For TEP reachability toward nodes in other pods, Spines learn a TEP range of other pods instead of individual TEP IPs via OSPF from IPN network (will discuss it later).

And then the IS-IS advertise it locally in its pod to all Leafs.

  • Council of oracle protocol (COOP): For endpoint details learned in a pod, it won't be affected if the COOP in the other pod stopped working, because coop is run only between Leafs and Spines in each pod.

And regarding endpoint details in one pod is shared and stored in the COOP database of the pod via (MP-BGP EVPN) between spine switches in each pod through IPN.

  • VPN v4/v6 MP-BGP: for L3out route distribution within a pod.

Also if the MP-BGP stopped working in one pod, it doesn't affect MP-BGP in

the other pod, because MP-BGP is runs and establishes neighborship between spine route reflector and leaf switches in each pod to distribute the L3Out within the pod.

On top of MP-BGP within a pod, Multi-POD establishes other MP-BGP VPNv4/v6 sessions between spine switches in each pod through IPN, to share learned L3Out in one pod to the pod.

ACI Multi-Pod

Failure Scenarios

Since the communications inside ACI depends on COOP database, and as we knows the database is used by Cisco APIC is split into several database units (shards), which each shard is replicated 3-times with each copy is assigned to a specific Cisco APIC. So Multi-pod fabric may face different failure scenarios due to APIC node positioning in pods.

In case of 3-node APIC cluster nodes, so the database shard is replicated on every APIC node in the cluster.

But in case of 5-node cluster, the database shards will be replicated on the three of the five nodes, So in this case an issue will happen to the fabric in case of failure.

Split-Brain Failure Scenario:

Multi-Pod Split-brain Scenario

A split brain failure scenario happens when the connectivity between the pod is interrupted.

In such scenario, all the APIC cluster nodes are up but connectivity between pods are down, so there is no communication between APIC nodes in pod-1 and APIC node in pod-2, So in this case there will be no issue regarding the read-write configuration in pod-1 because of the majority of APIC nodes are in pod-1 and APIC node in pod-2 will go for read only mode which is affecting its operation and can't perform any configuration for its local pod.

In case the APIC cluster has 5 APIC nodes (ex. 3 APICs in Pod 1, 2 APICs in Pod 2), the read/write, or read-only mode will be indeterministic because of shard replica distribution (3 replicas of each object). Replicas of some objects may be on 1 APIC in Pod 1 and 2 APICs in Pod 2, where Pod 2 APICs are majority and in read/write mode, but others may have majority in Pod 1, where Pod 1 APICs are in read/write mode.

So it is very important to keep the connectivity between two pods are up and fix it asap once happen.

Pod Failure Scenario

Pod Failure Scenario

In case of pod failure or disaster happen to one of the data center, So let's assume 3 APIC cluster and the failure happen to the pod with the majority APIC nodes where the shards database replicated across the 3 nodes, So in this case the pod with the single APIC node will go into read only mode.

So in such case we can add one APIC as standby in pod 2 and promote this controller to be active and re-establish the quorum of the cisco APICs.

In scenario of 5 node APIC cluster (3 APICs in pod-1 and 2 APICs inpod-2),

and the failure happen to pod-1 which has the majority of the APICs so in this case the same scenario happen and APICs on pod-2 go in read only mode, so we can add a standby controller which will provide cluster majority and re-establishment of quorum.

The only difference here is even if have a standby controller and due to this failure it may lead to the loss of information for the shards that were replicated in the three nodes on pod-1 (failed pod).

In such scenario, we can make fabric recovery from configuration backup.


In the next article, Will discuss Inter-Pod Network, how it is works and its consideration.


George Ngoru

Senior Network Engineer @ NEC Australia | IP Networking, Enterprise Networking and Security

10 个月

Thanks for sharing

回复
Palash Barua

SDN/IP/MPLS/Cloud-Native/Solution-Architect/Automation/Linux/DB (CCIE Enterprise # 60345)

10 个月

Awesome series about ACI

要查看或添加评论,请登录

Shehab Wagdy Nagy的更多文章

  • Configuring The EVPN VXLAN Fabric || Lab-1

    Configuring The EVPN VXLAN Fabric || Lab-1

    In today's topic will discuss the Configuration of VXLAN EVPN fabric and walk through the configurations step by step…

    10 条评论
  • Understanding Layer 3 Packet Walk in VXLAN EVPN

    Understanding Layer 3 Packet Walk in VXLAN EVPN

    The Layer 3 packet walk refers to the process a packet undergoes as it traverses the network, from the source host to…

    2 条评论
  • MP-BGP EVPN ARP Suppression

    MP-BGP EVPN ARP Suppression

    What is ARP suppression? ARP suppression is MP-BGP EVPN feature to reduce broadcast flooding caused by ARP request. Why…

  • VXLAN EVPN Distributed Anycast Gateway

    VXLAN EVPN Distributed Anycast Gateway

    Because of EVPN and VXLAN, we have the Distributed Anycast Gateway feature, enabling the VTEPs in the VXLAN EVPN…

    3 条评论
  • VXLAN Layer 2 Packet Walk (BUM Traffic)

    VXLAN Layer 2 Packet Walk (BUM Traffic)

    Last topic we discussed the bridging traffic for known host and explored how traffic is handled and what kind of route…

  • VXLAN EVPN Layer 2 Traffic Flow

    VXLAN EVPN Layer 2 Traffic Flow

    VXLAN EVPN Data Plane In today's topic will discuss the packet journey in VXLAN EVPN for layer 2 traffic and foucs on…

    1 条评论
  • VXLAN EVPN Data Plane

    VXLAN EVPN Data Plane

    Types of data plane traffic: Known Unicast Traffic: When VXLAN is configured, the switch forwards traffic differently…

    3 条评论
  • VXLAN MP-BGP EVPN Route Types

    VXLAN MP-BGP EVPN Route Types

    In the previous article we discussed VXLAN Control Plane options whether: Flood-and-Learn MP-BGP EVPN and its benefits…

    2 条评论
  • VXLAN EVPN Control Plane

    VXLAN EVPN Control Plane

    Agenda Will discuss different VXLAN control plane options Peer discovery and authentication IRP with MP-BGP EVPN MP-BGP…

    3 条评论
  • Introduction to VXLAN

    Introduction to VXLAN

    Agenda: Evolution of Data Center Network Design Legacy Data Center Challenges Why CLOS Fabric? VXLAN Benifits What is…

    10 条评论

社区洞察

其他会员也浏览了