Why does EVPN play smaller role in Cisco ACI?
Vahid Nazari
DC Consulting Engineer ? CISCO ACI ? VXLAN ? Hybrid & On-Prem Infra ? End-to-End integrated Solutions
Ethernet VPN, or EVPN, is one of the most well-known protocols in both service providers and data center fabrics. It well extends the BGP and makes it possible to include endpoint reachability information such as MAC and IP addresses, by adding an L2 address family. This fortunate confluence of BGP+EVPN then results in a strong and excellent control plane for VXLAN. That's how we name the comprehensive solution as "VXLAN MP-BGP EVPN". BGP EVPN provides significant enhancements for VXLAN such as "ARP Suppression", "Distributed IP Anycast Gateway", "Endpoint Mobility", "Virtual Port-Channel (vPC)" and so on.
Although Cisco ACI leverages VXLAN to build the infrastructure, when you come to the technology, you realize that some roles have changed and this may seem strange at first. Pay attention : EVPN is still used as part of Overlay control plane protocols in some ACI solutions in order to exchange endpoint reachability information across Pods or Sites. (For instance : ACI Multi-Pod, ACI Multi-Site and ACI Remote leaf). However, it's no longer utilized within any of the pods anymore. At first, you might think that it has simply been replaced by another protocol. But the subject goes beyond the scope of these words! Endpoint learning mechanisms in ACI has basically gone a different way, comparing to VXLAN BGP EVPN. We've heard a lot about BGP EVPN's features. So why did these changes happen?
To make a long story short, Cisco ACI is not supposed to do the same thing that VXLAN does! Rather, this technology has given rise to larger ambitions and dreams.
Generally speaking, comparing Cisco ACI with VXLAN BGP EVPN is basically wrong! since they are not the same in what they are supposed to do. It is obvious that, Some enhancements may be required to accommodate new features, and greater goals. Of course they don't just include EVPN!
How is endpoint learning done in VXLAN BGP EVPN?
In VXLAN BGP EVPN, all leaf nodes within a single fabric will advertise, learn, and store all endpoint information, even if there is never an endpoint behind a switch that requires this data. It's one of the situations in which Cisco ACI is dissatisfied.
The hardware resource savings are a huge advantage for scalable fabric.
Furthermore, VXLAN BGP EVPN has no appropriate option for Stretched Fabric! of course there is basically EVPN Multipod, along with Multi-Fabric and Multi-Site solutions. But duo to some critical shortcomings exist in VXLAN Multipod, It's essentially not recommended to use for Multi-Pod or Active-Active datacenters which are geographically dispersed (I'll go over its issues letter). This means that even for Multi-datacenter environments that are all part of a same Stretched-Fabric topology, we have to choose VXLAN Multi-site solution in which assumed there are several separated VXLAN fabrics interconnected together.
Of course VXLAN Multi-Site is recognized as a brilliant technology that provides both Layer 2 and Layer 3 interconnections for completely independent VXLAN fabrics, but the main use case of this solution is for DCI. there is no end-to-end VXLAN tunneling in this solution that is, for each inter-DC sending and receiving of just a single packet, VXLAN encapsulation is done 6 times! this would potentially problematic for mission critical Applications such as high-frequency trading, virtual reality over networks, peak conditions of banking transactions and so on.
That's how Cisco has attempted to provide one well-fitting ACI solution for every scenario and requirement.
How is endpoint learning done in Cisco ACI?
In contrast, Cisco ACI learns endpoint information in the data plane during packet forwarding, so there is no MP-BGP+EVPN up and running inside each ACI Pod.
Keep that in mind; the MP-BGP with VPNv4 alone still exists in the Overlay-1 VRF inside the Infra Tenant. It's used to distribute external routes from border leafs to other leaf switches.
Cisco ACI relies on the resources of spine switches instead of leafs, to store and collect all endpoint information.
It sounds more efficient.
ACI actually uses the Council Of Oracle Protocol (COOP) database located on each spine switch, known as an "Oracle". Since the hosts are directly connected to leafs, each leaf, which is known as a "citizen," is responsible for reporting its local endpoints to the COOP database. As a result, all endpoint information in ACI Fabric is stored in the spine COOP database. Consequently, there is no need for a leaf switch to already have remote endpoint information, whereas it could easily forward packets to the spine in the event that it doesn't know about a particular remote endpoint. This forwarding behavior is called "hardware proxy" or "spine proxy".
A key fact: A leaf switch already has endpoint information thanks to BGP+EVPN, but is there an option other than transmitting traffic to a spine? Is there anything special? like the packet is flying? Hardware-proxy means : Hey, lovely Leaf. You have no idea about the destination? It's Ok. don't butter yourself. Just keep forwarding! I know what should I do. Some engineers I've spoken, believed that leaf switches query the spine, catch the information, and then forward the traffic. It's incorrect. Hardware proxy doesn't work like that!
领英推荐
Don't worry about silent hosts, endpoint mobility, or even the movement of an IP address to a new MAC. The solutions are already provided.
Now, let's drill down more on VXLAN Multipod solution and go over the issues it has.
VXLAN BGP EVPN Multi-Pod
The first solution for extending the VXLAN fabric to more than one infrastructure is illustrated below, known as the VXLAN Multipod. ((Approximately deprecated!)) Of course, it's the most practical way of implementing this solution in which control plane protocols are isolated across two pods.
Even though the control plane protocols, including Underlay IGP and Overlay BGP, are separated from each other, the same VXLAN EVPN Fabric is extended across different locations, which leads the whole infrastructure to function like a single VXLAN fabric. In this situation, all endpoint information including MAC and IP are shared and advertised between two pods. With that said, for each movement of one endpoint across leaf nodes in Pod1, a new control plane update is sent towards Pod2 since it's the default behavior of BGP+EVPN in a single VXLAN Fabric (End-to-End EVPN Updates). The failure domain is kept extended across all the pods, and the scalability remains the same as it's again! a single VXLAN fabric. Eventually, as I mentioned before, all leaf switches have to learn all the endpoint information. Because of these shortcomings, this solution is not recommended for active/active geographically dispersed sites.
Cisco ACI Multi-Pod meanwhile, is completely a different story!
Why not outsource the responsibility for keeping endpoint information to spines? Don't forget that the COOP database within each pod already contains all the local endpoint information, and they will be synchronized across pods through MP-BGP EVPN.
The VXLAN Multipod has been replaced by more effective solutions, such as VXLAN Multi-Fabric and, especially, the VXLAN Multi-Site, which is one of the most appropriate solutions in traditional multi-datacenter environments.
VXLAN BGP EVPN Multi-Site
EVPN Multi-Site technology is based on IETF draft-sharma-multi-site-evpn. In this solution, there are two or more completely independent VXLAN fabrics that are interconnected together through a VXLAN BGP EVPN Layer 2 and Layer 3 overlay. This overlay network is also known as 'Site-External network'. Thus, unlike the previous Multipod architecture, there is neither a shared EVPN fabric nor an extended underlay across different sites.
VXLAN EVPN Multi-Site could be used for scaling-up a large intra-DC network, Datacenter Interconnect (DCI) and also integrate with legacy networks. As you can see in the picture above, the key functional components of this architecture are Border Gateways (BGW).
If you wish to learn more about VXLAN Multi-Site, click on the link below.
In a brief explanation, BGWs separate the VXLAN fabric-side from the Site-External network and mask the site-internal VTEPs. Now what does this mean, As it is shown in the following picture, the border gateway re-encapsulates traffic and changes the outer source and the outer destination. That is, for each sending and receiving of traffic, VXLAN encapsulation is done six times.
Of course the forwarding enhancements are not limited to EVPN, But also no Multicast PIM is up and running for handling BUM traffics within ACI Fabric. ACI relies on FTAG mechanism in order to make a path just like a Multicast Tree among Leaf and Spine nodes.
Technology Evangelist in Networking, CyberSecurity, Cloud. (CCIE x CCDE Expert | DevNET professional | CCSP | 13 x Azure Cert | CKA | Fortinet Architect | Splunk | RPA)
1 年Good sharing. Thanks
network consultant
2 年interesting explanations thanks
A meticulous professional with prominent experience as Network Engineer
2 年Excellent