登录查看更多内容

ACI Multi-Site Part#3 || Packet Flow

Shehab Wagdy Nagy

Cloud Enthusiast: AWS | CCIE | SDN Solutions | ACI | Network Automation Enthusiast

发布日期: 2024年5月3日

ACI Multi-Site deployment uses different overlay control and data plane functionalities for connecting endpoints that are deployed across different sites.

MP-BGP EVPN is used as a control plane between spine switches for exchanging host information for discovered endpoints that are part of separate fabrics to allow east-west communication between these endpoints.

And once those endpoints information is exchanged, the VXLAN data plane isused to allow intersite communications.

Let's go and check how control and data plane is handled in details.

Multi-Site Overlay Control Plane

As we know so far, for endpoints in different sites to communicate to each other, their EID need to be shared between sites. So ACI Multi-Site uses MP-BGP EVPN between spine switches across sites.

And for endpoint information to be shared to other sites, Cisco NDO need to indicate which EPG to stretch across sites.

or if the endpoint is related to different non-stretched EPG with a contract that allow the communication to EPG in different sites. So there are two scenarios here:

1- If the Endpoint has IP address,So it is shared across sites via MP-BGP EVPN.

2- If the Endpoint without IP address, So it is shared only when Layer 2 STRETCH is enabled on the NDO.

Before going to Overlay Data Plane, Let us understand how MP-BGP EVPN is used to share endpoint information across sites:

EP1 and EP2 are connected to different sites 1 and 2 respectively.
These endpoints are locally learned on each leaf, and coop control-plane message is generated for each endpoint on each site to the spine nodes.
The spine nodes at site-1 learn the locally connected EP1 at the leaf node, and the same is happening for site-2, till now no information is exchanged between sites because there is no policy in place yet indicating a need for those endpoints to communicate.
An intersite policy is defined in Cisco Nexus Dashboard Orchestrator and is then pushed and rendered in the two sites.
Once the inter-site policy is configured, a Type-2 EVPN update is triggered across sites to exchange information about EP1 and EP2. The Endpoint information is associated with the O-UTEP address, to identify which site this EP is discovered.

Multi-Site Overlay Data Plane

After endpoint information is exchanged across sites, the VXLAN data plane is used to allow intersite Layer 2 and Layer 3 communication. Let's go and understand the how ACI Multi-site handle the different traffic scenarios (BUM and Unicast traffic).

BUM Traffic between sites:

The deployment of VXLAN allows the use of a logical abstraction so that endpoints separated by multiple Layer 3 hops can communicate as if they were part of the same logical Layer 2 domain.

ACI Multi-Site enables ingress replication for this traffic (BUM traffic) on the source VXLAN TEP (VTEP) devices, which create multiple unicast copies of each BUM frame to be sent to the all remote VTEPs on which those endpoints are part of same layer 2 domain are connected, Once the BUM frame encapsulated with a unicast O-MTEP reaches the destination site, the destination site replicates the BUM frame and floods it within the site, which is the headend replication.

BUM traffic is forwarded to other sites only when Intersite BUM Traffic Allow is enabled on the bridge domain to let the flooded traffic to reach other sites as well.

There are three different types of Layer 2 BUM traffic:

Layer 2 Broadcast frames (B): Frames are always forwarded across sites when "Intersite BUM Traffic Allow" is enabled for the bridge domain.
Layer 2 unknown unicast frames (U): The frames are flooded only when Layer 2 Unknown Unicast is set to flood in the bridge domain regardless of Multi-Site.
Layer 2 Multicast frames (M): The traffic is forwarded across the sites whatever it is layer 3 or layer 2traffic where the bridge domain is stretched across sites with "Intersite BUM Traffic Allow" is enabled.

In the below example will check how Layer 2 BUM traffic occured:

EP 1 is belonging to BD, and generates a layer 2 BUM frame.
The frame is VXLAN-encapsulated and sent to the specific multicast group (Group IP address outer [GIPo]) associated with the bridge domain within the fabric along one of the specific multidestination trees associated to that GIPo, so it can reach all the other leaf and spine nodes in the same site.
One of the spine nodes connected to the ISN is elected as the designated forwarder for that specific bridge domain (this election is held between the spine nodes using IS-IS protocol exchanges). The designated forwarder is responsible for replicating each BUM frame for that bridge domain to all the remote sites with the same stretched bridge domain.
The designated forwarder makes a copy of the BUM frame and sends it to the remote sites. The destination IP address used when the packet is encapsulated with VXLAN is the special IP address (O-MTEP) identifying each remote site and is used specifically for the transmission of BUM traffic across sites. The source IP address for the VXLAN-encapsulated packet is instead the anycast O-UTEP address deployed on all the local spine nodes connected to the ISN.
One of the remote spine nodes receives the packet, translates the VNID value contained in the header to the locally significant VNID value associated with the same bridge domain, and sends the traffic within the site along one of the local multidestination trees for the bridge domain.
The traffic is forwarded within the site and reaches all the spine and leaf nodes with endpoints that are actively connected to the specific bridge domain.
The receiving leaf nodes use the information that is contained in the VXLAN header to learn the site location for endpoint EP1 that sourced the BUM frame. They also send the BUM frame to all (or some of) the local interfaces that are associated with the bridge domain, so that endpoint EP2 (in this example) can receive it.

Depending on the number of configured bridge domains, the same GIPo address may be associated with different bridge domains. Thus, when flooding for one of those bridge domains is enabled across sites, BUM traffic for the other bridge domains using the same GIPo address is also sent across the sites and will then be dropped on the received spine nodes. This behavior can increase the bandwidth utilization in the intersite network.

领英推荐

5G-NR Protocol Stack | Layer 1|Layer 2|Layer 3

TechLTE World 1 年前

The Insighter: Our automated assurance newsletter for…

RADCOM 9 个月前

Multimode Fiber: OM1 vs OM2 vs OM3 vs OM4 vs OM5

Fiber Cable Solution Technology Co.,Ltd. 1 年前

So to solve this issue and don't utilize the ISN bandwidth for useless traffic, when a bridge domain is configured as stretched with "Intersite BUM Traffic Allow" enabled from the Cisco NDO, by default a GIPo address is assigned from a separate range of multicast addresses. It is reflected in the user interface by the "Optimize WAN Bandwidth" flag, which is enabled by default for the bridge domain created by the NDO.

In the below example, we can verify the configured policy by NDO which is reflected on ACI.

Verify reflected configuration by NDO on ACI

Unicast Traffic between sites:

For any endpoint on the same subnet to be able to communicate together, they need to exchange the ARP information, and in ACI ARP handling is depends on the ARP Flooding feature in the bridge domain.

There are two different scenarios to consider:

ARP flooding is enabled in the bridge domain: When ARP flooding is enabled, the Intersite BUM Traffic Allow in the same bridge domain needs to be enabled as well because the ARP request is handled as normal broadcast traffic and is flooded.
ARP flooding is disabled in the bridge domain: When ARP flooding is disabled, the ARP request is handled as a routed unicast packet.

ARP Request from EP1 in site 1 to EP2 in Site 2

EP1 generates an ARP request for the EP2 IP address.
Since ARP Flooding is disabled, the local leaf node inspects the ARP payload and checks the target IP address which is of the EP2. Assuming that EP2’s IP information is initially unknown on the local leaf, the ARP request is encapsulated and sent toward the Proxy A anycast TEP address defined on all the local spine nodes to perform a lookup in the COOP database.
One of the local spine nodes receives the ARP request from the local leaf node.
The capability of forwarding ARP requests across sites in "unicast mode" is mainly dependent on the knowledge in the COOP database of the IP address of the remote endpoint (information that is received via the MP-BGP EVPN control plane with the remote spine nodes).
The VXLAN frame is received by one of the remote spine nodes, which translates the original VNID and class ID values to locally significant ones and encapsulates the ARP request, then sends it toward the local leaf node to which EP2 is connected.
The leaf node receives the frame, de-encapsulates it, and learns the class ID and site location information for remote endpoint EP1.
The frame is then forwarded to the local interface to which the EP2 is connected, assuming the ARP flooding is disabled on the bridge domain in Site 2 as well.

ARP Reply from EP 2 in site 2 to EP 1 site 1

8. EP2 sends ARP reply to the EP1.

9. The local leaf node encapsulates the traffic to remote O-UTEP A address.

10. The spine nodes also rewrite the source IP address of the VXLAN-encapsulated packet, with the local O-UTEP B address identifying the Site 2.

11. The VXLAN frame is received by the spine node, which translates the original VNID and class ID values of Site 2 to locally significant ones (Site 1) and sends it toward the local leaf node to which EP1 is connected.

12. The leaf node receives the frame, de-encapsulates it, and learns the class ID and site location information for remote endpoint EP2.

13. The frame is then sent to the local interface on the leaf node and reaches EP1.

Let's recap we have discovered in this topic:

How EP information is exchanged between differend site in different scenarios (EP has IP address and EP without IP address).
How Overaly Control Plane occured.
How Overlay Data plane occured.
How different data plane traffic is exchanged between sites.
How BUM traffic is exchanged between sites.
How Unicast traffic is exchanged betweensites.

See you in next topic.

Thanks alot.

Resources:

Cisco Application Centric Infrastructure - Cisco ACI Multi-Site Architecture White Paper - Cisco

Tech Talks

4,883 位关注者

Gilles TACITE

CCIE RS #64605 - Network Consultant at SFR ( Backbone MPLS/BGP - Grands Comptes )

10 个月

Thanks for sharing Veru useful

1 次回应

查看更多评论

要查看或添加评论，请登录

Shehab Wagdy Nagy的更多文章

Configuring The EVPN VXLAN Fabric || Lab-1

2024年10月3日

Configuring The EVPN VXLAN Fabric || Lab-1

In today's topic will discuss the Configuration of VXLAN EVPN fabric and walk through the configurations step by step…

10 条评论
Understanding Layer 3 Packet Walk in VXLAN EVPN

2024年8月25日

Understanding Layer 3 Packet Walk in VXLAN EVPN

The Layer 3 packet walk refers to the process a packet undergoes as it traverses the network, from the source host to…

2 条评论
MP-BGP EVPN ARP Suppression

2024年7月31日

MP-BGP EVPN ARP Suppression

What is ARP suppression? ARP suppression is MP-BGP EVPN feature to reduce broadcast flooding caused by ARP request. Why…
VXLAN EVPN Distributed Anycast Gateway

2024年7月26日

VXLAN EVPN Distributed Anycast Gateway

Because of EVPN and VXLAN, we have the Distributed Anycast Gateway feature, enabling the VTEPs in the VXLAN EVPN…

3 条评论
VXLAN Layer 2 Packet Walk (BUM Traffic)

2024年7月17日

VXLAN Layer 2 Packet Walk (BUM Traffic)

Last topic we discussed the bridging traffic for known host and explored how traffic is handled and what kind of route…
VXLAN EVPN Layer 2 Traffic Flow

2024年7月10日

VXLAN EVPN Layer 2 Traffic Flow

VXLAN EVPN Data Plane In today's topic will discuss the packet journey in VXLAN EVPN for layer 2 traffic and foucs on…

1 条评论
VXLAN EVPN Data Plane

2024年7月8日

VXLAN EVPN Data Plane

Types of data plane traffic: Known Unicast Traffic: When VXLAN is configured, the switch forwards traffic differently…

3 条评论
VXLAN MP-BGP EVPN Route Types

2024年7月3日

VXLAN MP-BGP EVPN Route Types

In the previous article we discussed VXLAN Control Plane options whether: Flood-and-Learn MP-BGP EVPN and its benefits…

2 条评论
VXLAN EVPN Control Plane

2024年6月30日

VXLAN EVPN Control Plane

Agenda Will discuss different VXLAN control plane options Peer discovery and authentication IRP with MP-BGP EVPN MP-BGP…

3 条评论
Introduction to VXLAN

2024年6月25日

Introduction to VXLAN

Agenda: Evolution of Data Center Network Design Legacy Data Center Challenges Why CLOS Fabric? VXLAN Benifits What is…

10 条评论

See all articles

ACI Multi-Site Part#3 || Packet Flow

Shehab Wagdy Nagy

Cloud Enthusiast: AWS | CCIE | SDN Solutions | ACI | Network Automation Enthusiast

Multi-Site Overlay Control Plane