End-to-End EVPN Architecture (EEE) Automation

End-to-End EVPN Architecture (EEE) Automation

Telecom operators and Internet service providers are compelled to optimize their acquisition and operational costs since the main product they sell, Internet access, sees its average value decreasing yearly. On the other hand, infrastructure is becoming more expensive as traffic volume grows exponentially.

The End-to-End EVPN (EEE) Architecture reduces acquisition costs (CapEx) by advocating using more straightforward data center equipment than traditional telecom equipment in the backbone and metropolitan networks. Operational expenses (OpEx) are reduced by simplifying the network by creating an underlay network directly related to the physical topology and an overlay network independent of the underlay network.

The overlay creates a virtualized network that can be viewed as a single L3 switch, regardless of the number of devices and the type of interconnections in the underlay network.

Every service provisioned on the overlay network depends only on allocating a VLAN that does not need to be unique, two BGP communities, and a VxLAN VNI. The service can be L2VPN, L3VPN, IPv4/IPv6 transit, broadband client concentration (PPPoE or IPoE), or value-added services. In the article Value-Added Services in the End-to-End EVPN (EEE) Architecture, the provision of firewalls and load balancers as value-added services implemented in a simplified manner on the overlay network was discussed.

This new article will provision L2VPN, L3VPN, and IPv4 transit services using Ansible for equipment configuration and Robot Framework for service validation. The EEE Architecture's simplification allows companies to use free tools to start their provisioning and service validation automation journey.

Another valuable concept presented in this article is the automated creation of test environments. The lab uses the following software:

·????? Virtualization:?PNETLAB.

·????? Arista cEOS is the EOS image that runs as a Docker container.

·????? Ansible.

·????? Robot Framework.

With this infrastructure, it is possible, for example, to bring up an environment in minutes to validate a new version of Arista's EOS operating system or any other manufacturer providing virtualized versions of their products.

The topology used is as follows:

The BB1 to BB6 devices are backbone routers in the underlay network, while the M7 to M12 devices are metro switches in the same network. Although MPLS is enabled on the backbone interfaces, it is not used for service traffic forwarding, which is done end-to-end via VxLAN. The EEE Architecture suggests that MPLS be used solely for implementing traffic engineering in the backbone and is entirely dispensable in metro networks, allowing more spartan data center switches to be used in this role.

The IS-IS routing protocol is used to learn the Loopback0 interface addresses of all network elements. The BB3 and BB4 routers are first-tier (backbone) route reflectors, and the BB5 and BB6 routers are second-tier route reflectors for directly connected metro switch rings. The backbone is in the ISIS L2 49.0002 area, and the metro networks are in the L1 49.0001 area. The BB5 and BB6 routers are L1/L2 routers configured in the 49.0001 area. Thus, only a default route is injected into the metro network, keeping the routing table extremely small in the metro networks where devices with less CPU and memory are positioned.

EVPN is the only family configured in the underlay network's BGP. Ansible playbooks, Robot codes, and the topology exported from PNETLAB can be downloaded from GitHub:?https://github.com/avargasn/eee_automation.

The cEOS version is 4.30.3M:

Underlay Network Provisioning

To provision the underlay network, the Ansible playbook?01.initial.playbook.yaml?is used, which can be found in the?ansible?directory. This playbook generates the configuration for all backbone and metro network nodes, including IS-IS and BGP EVPN, based on the templates in the?template/initial?directory:

The configuration files are saved in the?initial?directory with the format [device name].conf.

To execute the playbook:

ansible-playbook 01.initial.playbook.yaml

After generating the files, it is necessary to apply the configurations using the?02.deploy.playbook.yaml?playbook. This playbook connects to the devices via the management interface Mgmt/eth0, indicated in the topology as connected to the MGMT cloud. In the MGMT cloud, a jump server (the PNETLAB) intermediates the SSH connection between the station running Ansible and the EOS containers. These containers have a minimal initial configuration that sets up the management interface in the 10.0.137.0/24 network and defines the password for the admin user as?admin. SSH is enabled by default on EOS.

The execution of the playbook is identical to the one that generated the configurations:

ansible-playbook 02.deploy.playbook.yaml

The output of the playbook is as follows:

The magenta warnings can be safely ignored. To validate if the IS-IS adjacencies have formed and the BGP sessions have been established, use the?01.initial.robot?file located in the?robot?folder. The content of the file is as follows:


Here's how it works:

  • It uses the?@{HOSTS}?list, which contains all the management addresses of the underlay network devices, to connect via SSH.
  • After connecting to each device through the jump server at?192.168.86.3, the outputs of the commands?show isis neighbors?and?show isis interface brief?are collected.
  • The first output counts the number of established adjacencies (the word "UP"), and the second output counts the number of interfaces configured as point-to-point in IS-IS (the phrase "point-to-point").
  • If the number of neighbors matches the number of interfaces configured in IS-IS, there are no issues.
  • The output of the command?show bgp neighbors?is collected to validate the BGP sessions.
  • The number of configured neighbors is determined by counting the occurrences of the phrase "BGP neighbor is" in the output of?show bgp neighbors.
  • The number of established sessions is counted using the pattern "TCP state is ESTABLISHED".
  • If the two numbers are equal, all sessions have been established.

To run the test, change to the?robot?directory:

robot 01.initial.robot

The CLI output of the Robot Framework is as follows:

It also generates a report (log.html) with the execution log:

The log shows that in 25 seconds, the status of IS-IS adjacencies and BGP sessions was validated across 12 devices.

Provisioning of L2VPN Service

The L2VPN service comprises 3 sites: DC13, SW14, and SW15.

The switch in the center represents the logical representation of the entire underlay network, which is the simplification provided by the EEE Architecture.

The template used by Ansible to generate the L2VPN service configuration is in the directory templates/l2vpn/pe.j2directory.

Ansible processes everything enclosed in double curly braces?{{ }}. In the above template, some variables are replaced, and others are formatted. This is exemplified in the following code:

In it, Ansible formats the variable l2vpn_service_vlan and adds leading zeros until its length reaches five digits. VLAN 100 is used in the lab to provision the L2VPN service, so the code above generates the number 200100.

The two lines below are taken from the template:

They will generate the following Arista EOS command lines:

All variables used in the templates are taken from the Ansible inventory file?inventory.yaml. Below is an excerpt from the?l2vpn?group used to provision the L2VPN service:

The variable?service_role?is used in the playbook?03.l2vpn.playbook.yaml?to apply the template to the CE devices DC13, SW14, and SW15, as well as the PE devices BB1, M7, and M12:

In the CE devices, a simple configuration is set up with OSPF running on VLAN 20 so that this traffic is received by the PE devices, encapsulated in VNI 200100, and forwarded to other PE devices while maintaining VLAN 20. The Wireshark capture below shows this protocol stack:

This packet has the following layers:

  • The outer IP packet originates from BB1's loopback0 (198.18.255.1) and is destined for M7's loopback0 (198.18.255.7).
  • Above that is a UDP header with a destination port 4789, used by VxLAN.
  • The VxLAN header indicates that the VNI used is 200100.
  • Following VxLAN, an OSPF packet originates from DC13 (10.0.20.13) and is destined for the OSPF multicast address 224.0.0.5. Because this is a multicast destination, the packet is replicated to all VTEPs (VxLAN Tunnel End Points) of that VNI (M7 and M12).

To provision the L2VPN service, execute the playbook:

To validate the service, the Robot file?02.l2vpn.robot?is used:

The code above performs the following actions:

  • Connects to the router BB1 and switches M7 and M12 via SSH.
  • Executes the command?show mac address-table dynamic vlan 100?to capture the MAC address table of each device.
  • Uses the?Get Count?function to count occurrences of the word "DYNAMIC" in the previous command's output.
  • The counted number is compared with 3, the number of CEs in that L2VPN. If the numbers are equal, the test is successful.

The output below results from running the command?robot 02.l2vpn.robot:

The web report highlights the results of the counting and comparison:

Provisioning of L3VPN Service

The topology of the L3VPN service is as follows:

Once again, the overlay network is presented as a single L3 switch, now running BGP to learn and advertise routes to the CE devices. Ansible is used to generate configurations from templates located in the directory templates/l3vpn:


The robot is used to validate if all BGP sessions are established with the CE devices and if the VRF routing table contains the expected number of BGP routes. The Ansible playbook is?04.l3vpn.playbook.yaml?, and the Robot file is?03.l3vpn.robot. The output from Ansible:

The output from Robot is as follows:

Provisioning of IPv4 Transit Service

The IPv4 or IPv6 transit service uses the same topology found in Internet Exchange Points (IXPs) where there is a shared network, all participants connect to the shared network, and route exchange is facilitated using route servers. In the lab, the topology for IPv4 transit is as follows:


When the next hop of a route in the shared segment is changed, the path always follows the shortest path of the underlay network. However, logically, the shared subnet resembles an L2 switch.

The significant advantage of this design is that hierarchical route reflectors are not required, and traffic engineering is performed solely by altering policies on the route reflectors. The network's simple topology simplifies the creation of automation tools for routing policy changes and traffic engineering implementation.

The playbook?05.transit.playbook.yaml?generates configurations based on templates from the?templates/transit?directory. Robot uses the file?04.transit.robot to validate if the number of MAC addresses learned in the shared subnet matches the number of participants (RR19, RR20, PE21, PE22, C23, and C24).

PE21 and PE22 are edge routers connected to transit providers, while C23 and C24 are transit customers of the provider using the EEE Architecture.

It's also possible to directly connect transit providers to the shared segment. A future article will explore a more elaborate topology for IP transit service with elements such as Internet Exchange Points (IXPs), Content Delivery Networks (CDNs), and automated traffic engineering using Ansible. Anyway, this article will also focus on implementing traffic engineering with Ansible.

Below is the output of the playbook that generates and applies configurations to all elements, including routers and transit customers:

Finally, Robot is used to validate if the number of MAC addresses in VLAN 50 corresponds to the number of routers connected to it:

Traffic Engineering Automation

In this lab, the following traffic engineering will be implemented:

  • C23 should use PE22 for upload traffic. Similarly, download traffic should be routed through PE22, maintaining flow symmetry.
  • C24 should use PE21 for upload and route download traffic through PE21, maintaining flow symmetry.

The BGP table resulting from the initial configuration of the IPv4 transit service is as follows:

The critical information for traffic engineering is that route 25.25.0.0/16, announced by T25, points to PE21, the shortest path according to the BGP algorithm (AS PATH). Similarly, route 26.26.0.0/16, announced by T26, has its gateway at PE22, the shortest path according to BGP (AS PATH).

However, a fascinating fact is that each network has four entries, with two having longer AS PATHs. This occurs because, to enable symmetric traffic engineering in the EEE Architecture, the route reflectors RR19 and RR20 were configured to ignore the AS-PATH in the BGP best path selection algorithm and were set to advertise additional paths to their clients as proposed in RFC 7911 Advertisement of Multiple Paths in BGP.

The configuration block for the Arista cEOS containers used in the lab is as follows:

The BGP table of RR19 shows the effect of these configurations:

In this output, routes 25.25.0.0/16 and 26.26.0.0/16 have two entries marked as ECMP, indicating that each was learned from a different edge router. The following output confirms that all of them were sent to C23:

In the EEE Architecture, route reflectors can be physical routers, virtual machines, or containers if they provide ample processing power and RAM. This is necessary because they maintain large BGP tables in the RIB and implement changes in routing policies.

In implementing the IP transit service, the route reflectors were configured to apply inbound route maps that tag routes received from each client with unique values. These BGP communities will be used in outbound route maps to determine what will be advertised or filtered to each client, thus implementing traffic engineering. The BGP community identification dictionary is as follows:

This is the output of the command 'show ip bgp 26.26.0.0' executed on RR19:


Therefore, a policy that advertises only the route with community 65535:22 to the client will use PE22 as the gateway to reach the network 26.26.0.0/16.

The policies to be implemented are described as variables in the Ansible inventory:

These policies mean the following:

  • PE21 receives all routes except those marked with the C23 community (65535:23).
  • PE22 receives all routes except those marked with the C24 community (65535:24).
  • C23 receives all routes except those marked with the P21 community (65535:21); thus, its upstream is PE22.
  • C24 receives all routes except those marked with the P22 community (65535:22); thus, its upstream is PE21.

The implementation of traffic engineering is executed with playbook 06.te.playbook.yaml:

The BGP table on C23 demonstrates that the policy has been successfully implemented, as the path to access networks T25 and T26 goes through PE22:

The traceroute to address 25.25.0.1 confirms the path:

Finally, at T25, it is possible to confirm that traffic symmetry has been implemented correctly:

The output above shows that for T25 to reach the network of C23 (23.23.0.0/16), the path goes through T26 and PE22. To get the network of C24 (24.24.0.0/16), the path is directly through PE21.

The configuration on RR19 for C23 and PE21 is the implementation of the traffic engineering policy:

Closing words

The End-to-End EVPN Architecture (EEE) simplifies the network and allows for implementing automation tools using open and freely available solutions.

Ansible and the Robot Framework used in this article have extensive documentation available on the Internet and can be executed on any operating system that supports Python installation.

Any provider or operator can implement an automated service provisioning system using their CRM or ERP as a starting point. Such systems can be programmed so that once a contract is signed, the following actions can be taken:

  • Generate the Ansible inventory with the backbone nodes and metro networks involved in providing the service.
  • Send the inventory to a git server like Gitea.
  • A continuous integration and continuous delivery (CI/CD) tool like Jenkins monitors the git repository and executes Ansible and the Robot Framework to deploy and validate the new service.
  • Jenkins publishes the Robot Framework's results as a web page.

?

要查看或添加评论,请登录

Alberto Noronha的更多文章

社区洞察

其他会员也浏览了