End-to-End EVPN (EEE) Architecture Tutorial
End-to-End EVPN (formerly EVPN-Based ISP) is a network architecture advocated by PS Network that allows telecom operators and Internet service providers to simplify service provisioning and reduce equipment acquisition costs. The simplification is possible because the network is virtualized. The backbone and metro networks can be viewed as a single L3 switch (overlay network), regardless of the underlay network topology.
The overlay network is built using the EVPN family of the BGP protocol from the backbone to the metro networks (end-to-end EVPN). Cloud Titans use this architecture to provide services to hundreds of thousands of subscribers over a simpler underlay network called spine-leaf.
For the underlay, the EEE architecture advocates using data center routers and switches, employing EVPN in the control plane and VxLAN for traffic forwarding in the data plane. In this architecture, any L2 or L3 service is provisioned with:
·????? Allocation of a VLAN with local significance.
·????? Allocation of a Virtual Network Identifier (VNI).
·????? Definition of two BGP communities for route import and export (MAC or IPv4 and IPv6 addresses).
·????? Allocation of a physical interface, subinterface, or VLAN interface.
This simplicity allows the creation of automation systems or provisioning scripts in Ansible, Nornir, or any other tool the operator is comfortable with.
Equipment acquisition costs can be reduced because they do not need to support full IPv4 and IPv6 routing tables (full routing) or MPLS, as traffic is forwarded via VxLAN (IP + UDP). However, MPLS is likely to be maintained on backbone equipment for traffic engineering implementation (MPLS-TE), as long-distance links are scarce resources, not easily expanded, and expensive when they rely on optical networks (DWDM or SDH) to cover long distances.
In addition to reducing acquisition costs by relying on more straightforward equipment, the EEE Architecture uses data center routers and switches, which increases the number of suppliers. The term router in the EEE Architecture describes equipment with large buffers that can be replaced by switches that offer higher capacity buffers (4, 8, or 16GB). Since the backbone is a point of traffic aggregation and uses network interfaces with heterogeneous speeds, large buffers are needed to accommodate this variation. Therefore, in the EEE Architecture, the term router describes equipment with large buffers and interconnected to other equipment by various links. Switches are used in metropolitan networks whose topology is ring-based. Because switches reside in a simpler topology, the buffers can be smaller, in the range of megabytes, helping to reduce equipment costs.
It is worth noting that MPLS can be dispensed within the backbone if MPLS-TE is not necessary.
The EVPN Family of BGP
In telecom operators, Layer 2 (L2VPN) and Layer 3 (L3VPN) private network services are provisioned using MPLS for traffic forwarding. L2VPN services can be point-to-point (VPWS) or multipoint (VPLS), with the control plane being LDP, BGP, or BGP+LDP. For L3VPN, the control plane uses the IPv4 VPN and IPv6 VPN families of BGP. The EVPN family of BGP replaces all others, addressing difficulties in implementing redundancy and containing broadcast issues in VPLS. In the article Evolution Of EVPN in RFCs, Joe Neville consolidated the RFCs related to EVPN and highlighted the key points of each recommendation.
Therefore, choosing an architecture that uses EVPN as the control plane is aligned with modernizing the provisioning of services in telecom operator and access provider networks.
EEE Architecture Tutorial
This tutorial will implement a telecom operator network, offering IPv4 transit, multipoint L2VPN, and L3VPN services.
Underlay topology for the tutorial:
·????? Backbone Routers: Use MPLS with LDP, IS-IS as the IGP for announcing loopback interfaces, and BGP with EVPN.
·????? Metro Networks: Do not use MPLS forwarding. IS-IS adjacencies are established over unnumbered interfaces.
·????? Backbone IS-IS Configuration: Operates at level 2 and in area 49.0002.
·????? Metro IS-IS Configuration: Operates at level 1 and in area 49.0001.
This arrangement allows a default route to be advertised to the metro network via the attached bit on the L2-L1 routers BB5 and BB6. The default route is sufficient for establishing end-to-end EVPN services since, unlike MPLS, a host route (/32) is unnecessary for the service's next hop.
All equipment used is Arista's cEOS 4.30.3M, which is a version of EOS that runs as a Docker container:
cEOS can be downloaded directly from the Arista website after creating a user account. The lab environment can be PNETLAB, CONTAINERlab, or EVE-NG Pro.
BB3 and BB4 are route reflectors for the EVPN family of backbone routers, and BB5 and BB6 are the reflectors for the switches of the directly connected metro rings.
The list below consolidates the resources used in the network, which, in an actual implementation, would undergo a few changes:
·????? IS-IS L2 in the backbone.
·????? IS-IS L1 in the metro network with the attached-bit set on the concentration routers (metro headend).
·????? Unnumbered interfaces in the metro networks.
·????? BFD across the entire network, including the unnumbered interfaces of the metro network.
·????? BGP EVPN has two tiers of reflectors: BB3 and BB4 for the backbone and BB5 and BB6 for the directly connected metro networks.
·????? MPLS between the BB1, BB2, BB3, and BB4 routers for implementing traffic engineering with MPLS-TE, if necessary.
Underlay Configuration
The initial configuration can be downloaded from GitHub:
Underlay Validation
The strategy for allocating loopback interfaces uses the 198.18.255.0/24 block, where the equipment's numeric identifier populates the last octet. For example, BB1 has the loopback address 198.18.255.1.
BB1
BB2
BB3
BB4
BB5
BB6
M7
M8
M9
M10
M11
M12
Provisioning of Services
The significant advantage of the EEE Architecture lies in simplifying the topology for provisioning L2 and L3 services, as the complexity of the underlay network disappears with the logical creation of a single L3 switch:
Implementation of Multipoint L2VPN Service
In this scenario, a data center is connected to router BB1, and two sites connected to the metro networks consume L2 content:
For the provisioning of the above service, the following parameters will be used:
·????? VLAN: 100
·????? BGP Community: 65535:200100
·????? VNI: 200100
·????? Physical interfaces as indicated in the topology.
The Virtual Network Identifier (VNI) is a unique marker identifying an L2 segment within a VxLAN network. When it is stated that the EEE Architecture creates a large L3 switch, it means that a large VxLAN switch is created, where the control plane maps a VLAN with local significance to a VNI with global significance. It is possible to create 16,777,215 unique VNI segments.
The figure below represents the logical topology created by EVPN:
The DC13, SW14, and SW15 switches are connected to interfaces configured as dot1q-tunnel in VLAN 100. Every learned MAC address is advertised via EVPN on VNI 200100. The routers and switches involved in this topology, known as MAC-VRF, create MAC tables used by the data plane to forward traffic to those destinations encapsulated in VxLAN packets.
An OSPF session will be configured between DC13 and SW14/SW15 on VLAN 20 (10.0.20.0/24) to validate the service. Each switch's loopback address will be allocated from block 10.255.255.0/24.
Configuration
The L2VPN service configuration can be downloaded from GitHub:
Validation
BB1
Control Plane
The output above demonstrates the following:
·????? The MAC address 5000.0105.5d27 (from SW14) was announced by M7 (Originator-ID: 198.18.255.7) and was learned by BB1 from RRs BB3 and BB4. The BGP Cluster List (C-LST) attribute shows that the original advertisement was made to BB5 (198.18.255.5), the RR for M7's metro network.
·????? The MAC address 5000.01c0.9ab3 is in the BGP EVPN table because it was locally learned on BB1 and was advertised via EVPN to the rest of the network.
·????? The MAC address 5000.01e2.63a5 (from SW15) was announced by M12 (Originator-ID: 198.18.255.12) and learned by BB1 from RRs BB3 and BB4. The BGP Cluster List (C-LST) attribute shows that the original advertisement was made to BB6 (198.18.255.6), the RR for M12's metro network.
Data Plane
领英推荐
The MAC table for VLAN 100 reflects the information from the BGP EVPN table as follows:
·????? The MAC addresses 5000.0105.5d27 (SW14) and 5000.01e2.63a5 (SW15) are associated with port Vx1 (Vxlan interface 1) and VLAN 100, which was explicitly mapped to VNI 200100.
·????? The MAC address 5000.01c0.9ab3 was learned locally in VLAN 100 and Ethernet3 interface.
M7
M12
DC13
Replication of Multicast
OSPF uses multicast to establish and maintain its neighbors. In a traditional LAN, a multicast packet a switch receives is replicated to all ports belonging to that VLAN. This behavior also applies to broadcast and unknown unicast packets. These packets are collectively referred to as BUM (broadcast, unknown unicast, and multicast) and are treated similarly in EVPN. The router or switch that receives a BUM packet replicates it to all participants of the VNI using either Ingress Replication (IR) or Multicast Replication methods. IR is the default method for Arista and other manufacturers' equipment, and it is the simplest. Multicast replication is more complex because multicast routing must be configured in the underlay network.
Devices discover other devices participating in a VNI through EVPN type 3 routes called Inclusive Multicast Ethernet Tags (IMET). On BB1, the type 3 routes for VNI 200100 are as follows:
The output above indicates that:
·????? VNI 200100 is configured on three devices: 198.18.255.1 (BB1), 198.18.255.7 (M7), and 198.18.255.12 (M12).
·????? The replication method is Ingress Replication (PMSI Tunnel: Ingress Replication).
·????? VNI used for replication is 200100 (MPLS Label: 200100). Initially, EVPN supported only MPLS forwarding but later extended to support VxLAN while keeping the field name unchanged.
·????? The forwarding method is VxLAN (TunnelEncap: tunnelTypeVxlan) and not MPLS. This can be observed in the packet capture excerpt displayed below:
In that frame, the following information is present:
·????? The source address of the UDP packet is 198.18.255.7 (M7).
·????? The packet's destination is 198.18.255.1 (BB1).
·????? There is a VxLAN encapsulated packet within the IP packet, with VNI 200100.
·????? Following VxLAN, there is an 802.1Q Ethernet header with VLAN 20.
·????? After the VLAN, there is an OSPF packet with source 10.0.20.14 (DC14).
·????? The OSPF destination is multicast address 224.0.0.5.
All this information confirms that a multicast packet is being replicated via unicast from M7 to BB1.
When BB1 receives this packet, it is identified as VxLAN due to the destination port 4789, and the following process occurs:
·????? The outer IP and UDP headers are removed.
·????? The VxLAN header is parsed, and the VNI determines which interfaces and VLAN the packet will be forwarded to.
·????? If the destination MAC address is unicast and an entry exists in the equipment's MAC table, the packet is forwarded to the specific interface. If no entry exists, it is replicated to all interfaces in that VLAN.
·????? If the packet is multicast or broadcast, it is forwarded to all interfaces associated with the VLAN mapped to the VNI.
Implementation of Point-to-Point L2VPN Service
EVPN Point-to-Point L2VPN (VPWS) is supported only with MPLS forwarding. Therefore, in the EEE Architecture, it is not utilized as the data plane is implemented over VxLAN. However, to implement Point-to-Point L2VPN, you configure EVPN using only two endpoints. Many service providers do not use VPWS because network equipment does not create MAC entries on interfaces connected to that service, complicating troubleshooting. Instead, providers typically implement point-to-point scenarios using VPLS. Therefore, the lack of VPWS support in the EEE Architecture does not result in any network functionality loss.
Implementation of L3VPN Service
L3VPN based on EVPN is configured similarly to traditional L3VPN over MPLS:
·????? Creation of a VRF (Virtual Routing and Forwarding).
·????? Definition of BGP communities for importing and exporting routes.
·????? Configuration of L3 interfaces on PE (Provider Edge) and CE (Customer Edge) devices, which can be physical or VLAN interfaces.
·????? Definition and configuration of routing protocols between PE and CE. All protocols are supported, but BGP is typically preferred due to its route filtering capabilities.
The logical topology for the L3VPN that will be demonstrated is as follows:
To implement the above topology, the following information will be used:
·????? CE16 will advertise route 10.0.16.0/24.
·????? CE17 will advertise route 10.0.17.0/24.
·????? CE18 will advertise route 10.0.18.0/24.
·????? The routing protocol used will be BGP (Border Gateway Protocol).
·????? The CE devices' ASN (Autonomous System Number) will be 65001.
·????? The ASN for the PE devices will be 65535, and the AS override feature will be configured.
Configuration
The L3VPN service configuration can be downloaded from GitHub:
Validation
BB2
The output above shows that:
·????? Routes 10.0.16.0/24 and 198.18.1.0/31 were advertised by BB2 itself.
·????? Routes 10.0.17.0/24 and 198.18.1.2/31 were advertised by M8.
·????? Routes 10.0.18.0/24 and 198.18.1.4/31 were advertised by M11.
M8
M11
L3VPN Traffic Forwarding with VxLAN
The figure below is an excerpt from a packet capture executed on BB2. In it, CE16 (198.18.1.1) is pinging the address 10.0.18.1 of CE18:
The packet structure is the same as observed in the L2VPN capture:
·????? There is an outer IP packet with source 198.18.255.2 (BB2) and destination 198.18.255.11 (M11).
·????? After the UDP header, a VxLAN header with VNI 30001 is allocated for the L3VPN VRF.
·????? Following VxLAN, there is another Ethernet header with the source MAC address as the router-mac allocated by BB2 and the destination MAC address as the router-mac allocated by M11.
·????? Finally, the ping was executed from CE16 to CE18.
The router-mac for the route 10.0.10.0/24 can be viewed in the BGP table of BB2:
Implementation of IP Transit Service
To implement the IP transit service in the EEE Architecture, it follows a similar approach as offered by Internet Exchange Points (IXPs):
·????? There is a shared LAN among all participants.
·????? Route exchange occurs through route servers.
The advantages for transit operators and Internet Service Providers (ISPs) adopting the EEE Architecture for IP transit service include:
·????? No need to create a route reflection hierarchy for the service.
·????? Linux servers or virtualized routers can be used as route servers.
·????? Traffic engineering is simplified since policy changes in BGP are centralized at the route servers. For instance, flow collectors (Netflow, IPFIX, etc.) can identify heavy-consuming clients with a transit operator during uplink congestion. Policies can then be adjusted to reroute their traffic from congested links to paths with available bandwidth.
·????? In the underlay, traffic flows optimally as all IP transit traffic is encapsulated in VxLAN, with the destination address being the router or switch gateway connected to the destination network.
The topology for the IP transit lab will be as follows:
The following information will be used for the configuration of the scenario:
·????? T21 is an IP transit provider with ASN 21 and will announce route 21.21.0.0/16.
·????? T22 is an IP transit provider with ASN 22 and will announce route 22.22.0.0/16.
·????? C23 is an IP transit customer with ASN 23 and will announce route 23.23.0.0/16.
·????? C24 is an IP transit customer with ASN 24 and will announce route 24.24.0.0/16.
·????? VLAN 50 has been allocated locally on routers BB2, BB3, BB4, and BB5, and switches M9 and M10 to implement the shared LAN between transit providers, route servers, and transit customers.
·????? VNI 200050 has been allocated for the service.
·????? Traffic engineering will be implemented so that C23 consumes content via T21 and C24 consumes via T22.
·????? Community 65535:21 will be used to mark routes that should be announced only to T21.
·????? Community 65535:22 will be used to mark routes that should be announced only to T22.
Configuration
The IP Transit service configuration can be downloaded from GitHub:
Validation
RR19
The output below demonstrates that RR19 has learned routes from all its neighbors on VLAN50:
C23
In C23, the routing table shows that the next hop for each route learned via BGP has remained the same. Therefore, traffic destined for those routes will follow the optimized path in the underlay.
T21
The routing table of T21 shows the following:
·????? The route for the block announced by C23 (23.23.0.0/16) has a next-hop address on the shared LAN.
·????? The route for the block announced by C24 (24.24.0.0/16) has a next-hop address on the link with T22. This confirms that the traffic engineering policy is working, as traffic destined for C23 should flow through T21, and traffic destined for C24 should flow through T22.
Conclusion
Conceptual tests prove that the EEE Architecture can significantly reduce operational (OpEx) and acquisition costs for telecom operators and Internet service providers.
OpEx reduction occurs because the topology is simplified, enabling the implementation of provisioning tools where each service only requires four pieces of information to be configured:
·????? VLAN with local significance.
·????? BGP communities for route import and export.
·????? Virtual Network Identifier (VNI).
·????? Interface where the service will be activated.
Additionally, troubleshooting becomes simpler as operators only need to check for the presence of MAC addresses in each service's VLANs and interfaces.
CapEx reduction is achievable due to several factors:
·????? Eliminating the need for routers or switches with full routing support in the backbone.
·????? Ability to use switches without MPLS support in metro networks.
·????? There is an increased variety of equipment suppliers because data center routers and switches can be used in the backbone and metro networks. With many players playing, the price should be lower.
·????? Utilization of Linux servers, VMs, router containers, and even equipment with hardware incapable of supporting full routing as learned routes do not need to be installed in the route reflectors' FIB.
?