End-to-End EVPN Architecture (EEE) Automation
Telecom operators and Internet service providers are compelled to optimize their acquisition and operational costs since the main product they sell, Internet access, sees its average value decreasing yearly. On the other hand, infrastructure is becoming more expensive as traffic volume grows exponentially.
The End-to-End EVPN (EEE) Architecture reduces acquisition costs (CapEx) by advocating using more straightforward data center equipment than traditional telecom equipment in the backbone and metropolitan networks. Operational expenses (OpEx) are reduced by simplifying the network by creating an underlay network directly related to the physical topology and an overlay network independent of the underlay network.
The overlay creates a virtualized network that can be viewed as a single L3 switch, regardless of the number of devices and the type of interconnections in the underlay network.
Every service provisioned on the overlay network depends only on allocating a VLAN that does not need to be unique, two BGP communities, and a VxLAN VNI. The service can be L2VPN, L3VPN, IPv4/IPv6 transit, broadband client concentration (PPPoE or IPoE), or value-added services. In the article Value-Added Services in the End-to-End EVPN (EEE) Architecture, the provision of firewalls and load balancers as value-added services implemented in a simplified manner on the overlay network was discussed.
This new article will provision L2VPN, L3VPN, and IPv4 transit services using Ansible for equipment configuration and Robot Framework for service validation. The EEE Architecture's simplification allows companies to use free tools to start their provisioning and service validation automation journey.
Another valuable concept presented in this article is the automated creation of test environments. The lab uses the following software:
·????? Virtualization:?PNETLAB.
·????? Arista cEOS is the EOS image that runs as a Docker container.
·????? Ansible.
·????? Robot Framework.
With this infrastructure, it is possible, for example, to bring up an environment in minutes to validate a new version of Arista's EOS operating system or any other manufacturer providing virtualized versions of their products.
The topology used is as follows:
The BB1 to BB6 devices are backbone routers in the underlay network, while the M7 to M12 devices are metro switches in the same network. Although MPLS is enabled on the backbone interfaces, it is not used for service traffic forwarding, which is done end-to-end via VxLAN. The EEE Architecture suggests that MPLS be used solely for implementing traffic engineering in the backbone and is entirely dispensable in metro networks, allowing more spartan data center switches to be used in this role.
The IS-IS routing protocol is used to learn the Loopback0 interface addresses of all network elements. The BB3 and BB4 routers are first-tier (backbone) route reflectors, and the BB5 and BB6 routers are second-tier route reflectors for directly connected metro switch rings. The backbone is in the ISIS L2 49.0002 area, and the metro networks are in the L1 49.0001 area. The BB5 and BB6 routers are L1/L2 routers configured in the 49.0001 area. Thus, only a default route is injected into the metro network, keeping the routing table extremely small in the metro networks where devices with less CPU and memory are positioned.
EVPN is the only family configured in the underlay network's BGP. Ansible playbooks, Robot codes, and the topology exported from PNETLAB can be downloaded from GitHub:?https://github.com/avargasn/eee_automation.
The cEOS version is 4.30.3M:
Underlay Network Provisioning
To provision the underlay network, the Ansible playbook?01.initial.playbook.yaml?is used, which can be found in the?ansible?directory. This playbook generates the configuration for all backbone and metro network nodes, including IS-IS and BGP EVPN, based on the templates in the?template/initial?directory:
The configuration files are saved in the?initial?directory with the format [device name].conf.
To execute the playbook:
ansible-playbook 01.initial.playbook.yaml
After generating the files, it is necessary to apply the configurations using the?02.deploy.playbook.yaml?playbook. This playbook connects to the devices via the management interface Mgmt/eth0, indicated in the topology as connected to the MGMT cloud. In the MGMT cloud, a jump server (the PNETLAB) intermediates the SSH connection between the station running Ansible and the EOS containers. These containers have a minimal initial configuration that sets up the management interface in the 10.0.137.0/24 network and defines the password for the admin user as?admin. SSH is enabled by default on EOS.
The execution of the playbook is identical to the one that generated the configurations:
ansible-playbook 02.deploy.playbook.yaml
The output of the playbook is as follows:
The magenta warnings can be safely ignored. To validate if the IS-IS adjacencies have formed and the BGP sessions have been established, use the?01.initial.robot?file located in the?robot?folder. The content of the file is as follows:
Here's how it works:
To run the test, change to the?robot?directory:
robot 01.initial.robot
The CLI output of the Robot Framework is as follows:
It also generates a report (log.html) with the execution log:
The log shows that in 25 seconds, the status of IS-IS adjacencies and BGP sessions was validated across 12 devices.
Provisioning of L2VPN Service
The L2VPN service comprises 3 sites: DC13, SW14, and SW15.
The switch in the center represents the logical representation of the entire underlay network, which is the simplification provided by the EEE Architecture.
The template used by Ansible to generate the L2VPN service configuration is in the directory templates/l2vpn/pe.j2directory.
Ansible processes everything enclosed in double curly braces?{{ }}. In the above template, some variables are replaced, and others are formatted. This is exemplified in the following code:
In it, Ansible formats the variable l2vpn_service_vlan and adds leading zeros until its length reaches five digits. VLAN 100 is used in the lab to provision the L2VPN service, so the code above generates the number 200100.
The two lines below are taken from the template:
They will generate the following Arista EOS command lines:
All variables used in the templates are taken from the Ansible inventory file?inventory.yaml. Below is an excerpt from the?l2vpn?group used to provision the L2VPN service:
The variable?service_role?is used in the playbook?03.l2vpn.playbook.yaml?to apply the template to the CE devices DC13, SW14, and SW15, as well as the PE devices BB1, M7, and M12:
In the CE devices, a simple configuration is set up with OSPF running on VLAN 20 so that this traffic is received by the PE devices, encapsulated in VNI 200100, and forwarded to other PE devices while maintaining VLAN 20. The Wireshark capture below shows this protocol stack:
This packet has the following layers:
To provision the L2VPN service, execute the playbook:
To validate the service, the Robot file?02.l2vpn.robot?is used:
The code above performs the following actions:
领英推荐
The output below results from running the command?robot 02.l2vpn.robot:
The web report highlights the results of the counting and comparison:
Provisioning of L3VPN Service
The topology of the L3VPN service is as follows:
Once again, the overlay network is presented as a single L3 switch, now running BGP to learn and advertise routes to the CE devices. Ansible is used to generate configurations from templates located in the directory templates/l3vpn:
The robot is used to validate if all BGP sessions are established with the CE devices and if the VRF routing table contains the expected number of BGP routes. The Ansible playbook is?04.l3vpn.playbook.yaml?, and the Robot file is?03.l3vpn.robot. The output from Ansible:
The output from Robot is as follows:
Provisioning of IPv4 Transit Service
The IPv4 or IPv6 transit service uses the same topology found in Internet Exchange Points (IXPs) where there is a shared network, all participants connect to the shared network, and route exchange is facilitated using route servers. In the lab, the topology for IPv4 transit is as follows:
When the next hop of a route in the shared segment is changed, the path always follows the shortest path of the underlay network. However, logically, the shared subnet resembles an L2 switch.
The significant advantage of this design is that hierarchical route reflectors are not required, and traffic engineering is performed solely by altering policies on the route reflectors. The network's simple topology simplifies the creation of automation tools for routing policy changes and traffic engineering implementation.
The playbook?05.transit.playbook.yaml?generates configurations based on templates from the?templates/transit?directory. Robot uses the file?04.transit.robot to validate if the number of MAC addresses learned in the shared subnet matches the number of participants (RR19, RR20, PE21, PE22, C23, and C24).
PE21 and PE22 are edge routers connected to transit providers, while C23 and C24 are transit customers of the provider using the EEE Architecture.
It's also possible to directly connect transit providers to the shared segment. A future article will explore a more elaborate topology for IP transit service with elements such as Internet Exchange Points (IXPs), Content Delivery Networks (CDNs), and automated traffic engineering using Ansible. Anyway, this article will also focus on implementing traffic engineering with Ansible.
Below is the output of the playbook that generates and applies configurations to all elements, including routers and transit customers:
Finally, Robot is used to validate if the number of MAC addresses in VLAN 50 corresponds to the number of routers connected to it:
Traffic Engineering Automation
In this lab, the following traffic engineering will be implemented:
The BGP table resulting from the initial configuration of the IPv4 transit service is as follows:
The critical information for traffic engineering is that route 25.25.0.0/16, announced by T25, points to PE21, the shortest path according to the BGP algorithm (AS PATH). Similarly, route 26.26.0.0/16, announced by T26, has its gateway at PE22, the shortest path according to BGP (AS PATH).
However, a fascinating fact is that each network has four entries, with two having longer AS PATHs. This occurs because, to enable symmetric traffic engineering in the EEE Architecture, the route reflectors RR19 and RR20 were configured to ignore the AS-PATH in the BGP best path selection algorithm and were set to advertise additional paths to their clients as proposed in RFC 7911 Advertisement of Multiple Paths in BGP.
The configuration block for the Arista cEOS containers used in the lab is as follows:
The BGP table of RR19 shows the effect of these configurations:
In this output, routes 25.25.0.0/16 and 26.26.0.0/16 have two entries marked as ECMP, indicating that each was learned from a different edge router. The following output confirms that all of them were sent to C23:
In the EEE Architecture, route reflectors can be physical routers, virtual machines, or containers if they provide ample processing power and RAM. This is necessary because they maintain large BGP tables in the RIB and implement changes in routing policies.
In implementing the IP transit service, the route reflectors were configured to apply inbound route maps that tag routes received from each client with unique values. These BGP communities will be used in outbound route maps to determine what will be advertised or filtered to each client, thus implementing traffic engineering. The BGP community identification dictionary is as follows:
This is the output of the command 'show ip bgp 26.26.0.0' executed on RR19:
Therefore, a policy that advertises only the route with community 65535:22 to the client will use PE22 as the gateway to reach the network 26.26.0.0/16.
The policies to be implemented are described as variables in the Ansible inventory:
These policies mean the following:
The implementation of traffic engineering is executed with playbook 06.te.playbook.yaml:
The BGP table on C23 demonstrates that the policy has been successfully implemented, as the path to access networks T25 and T26 goes through PE22:
The traceroute to address 25.25.0.1 confirms the path:
Finally, at T25, it is possible to confirm that traffic symmetry has been implemented correctly:
The output above shows that for T25 to reach the network of C23 (23.23.0.0/16), the path goes through T26 and PE22. To get the network of C24 (24.24.0.0/16), the path is directly through PE21.
The configuration on RR19 for C23 and PE21 is the implementation of the traffic engineering policy:
Closing words
The End-to-End EVPN Architecture (EEE) simplifies the network and allows for implementing automation tools using open and freely available solutions.
Ansible and the Robot Framework used in this article have extensive documentation available on the Internet and can be executed on any operating system that supports Python installation.
Any provider or operator can implement an automated service provisioning system using their CRM or ERP as a starting point. Such systems can be programmed so that once a contract is signed, the following actions can be taken:
?