VPP at Coloclue, part 2

Pim van Pelt

Distinguished Software Engineer at Google

发布日期: 2023年9月4日

About two and a half years ago, in February of 2021, I created a loadtesting environment at [Coloclue] to prove that a provider of L2 connectivity between two datacenters in Amsterdam was not incurring jitter or loss on its services – I wrote up my findings in [an article], which demonstrated that the service provider indeed provides a perfect service. One month later, in March 2021, I briefly ran [VPP] on one of the routers at Coloclue, but due to lack of time and a few technical hurdles along the way, I had to roll back [ref].

The Problem

Over the years, Coloclue AS8283 continues to suffer from packet loss in its network. Taking a look at a simple traceroute, in this case from IPng AS8298, shows very high variance and packetlo when entering the network (at hop 5 in a router called eunetworks-2.router.nl.coloclue.net):

                                       My traceroute  [v0.94]                
squanchy.ipng.ch (194.1.193.90) -> 185.52.227.1                           2023-02-24T09:03:36+0100
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                          Packets               Pings
 Host                                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. chbtl0.ipng.ch                                       0.0% 49904    1.3   0.9   0.7   1.7   0.2
 2. chrma0.ipng.ch                                       0.0% 49904    1.7   1.2   1.2   2.1   0.9
 3. defra0.ipng.ch                                       0.0% 49904    6.3   6.2   6.0  19.2   1.3
 4. nlams0.ipng.ch                                       0.0% 49904   12.7  12.6  12.4  19.8   1.8
 5. bond0-105.eunetworks-2.router.nl.coloclue.net        0.2% 49903   98.8  12.3  12.0 272.8  23.0
 6. 185.52.227.1                                         6.6% 49903   15.3  12.5  12.3 308.7  20.4

The last two hops show the packet loss well north of 6.5%, some paths are better, some are worse, but notably when more than one router is in the path, it’s difficult to pinpoint where or what is responsible. But honestly, any source will reveal packet loss and high variance when traversing through one or more Coloclue routers, to more or lesser degree:

The screenshots above are smokeping from (top) a machine at AS8283 Coloclue (in Amsterdam, the Netherlands), and from (bottom) a machine at AS8298 IPng (in Brüttisellen, Switzerland), both are showing ~4.8-5.0% packetlo and high variance in end to end latency. No bueno!

Isolating a Device Under Test

Because Coloclue has several routers, I want to ensure that traffic traverses only the one router under test. I decide to use an allocated but currently unused IPv4 prefix and announce that only from one of the four routers, so that all traffic to and from that /24 goes over that router. Coloclue uses a piece of software called Kees, a set of Python and Jinja2 scripts to generate a Bird1.6 configuration for each router. This is great because that allows me to add a small feature to get what I need: beacons.

Setting up the beacon

A beacon is a prefix that is sent to (some, or all) peers on the internet to attract traffic in a particular way. I added a function called is_coloclue_beacon() which reads the input YAML file and uses a construction similar to the existing feature for “supernets”. It determines if a given prefix must be announced to peers and upstreams. Any IPv4 and IPv6 prefixes from the beacons list will be then matched in is_coloclue_beacon() and announced. For the curious, [this commit] holds the logic and tests to ensure this is safe.

Based on a per-router config (eg. vars/eunetworks-2.router.nl.coloclue.net.yml) I can now add the following YAML stanza:

coloclue:
  beacons:
    - prefix: "185.52.227.0"
      length: 24
      comment: "VPP test prefix (pim, rogier)"

And further, from this router, I can forward all traffic destined to this /24 to a machine running in EUNetworks (my Dell R630 called hvn0.nlams2.ipng.ch), using a simple static route:

statics:
  ...
  - route: "185.52.227.0/24"
    via: "94.142.240.71"
    comment: "VPP test prefix (pim, rogier)"

After running Kees, I can now see traffic for that /24 show up on my machine. The last step is to ensure that traffic that is destined for the beacon will always traverse back over eunetworks-2. Coloclue has VRRP and sometimes another router might be the logical router. With a little trick on my machine, I can force traffic by means of policy based routing:

pim@hvn0-nlams2:~$ sudo ip ro add default via 94.142.240.254
pim@hvn0-nlams2:~$ sudo ip ro add prohibit 185.52.227.0/24
pim@hvn0-nlams2:~$ sudo ip addr add 185.52.227.1/32 dev lo
pim@hvn0-nlams2:~$ sudo ip rule add from 185.52.227.0/24 lookup 10
pim@hvn0-nlams2:~$ sudo ip ro add default via 94.142.240.253 table 10

First, I set the default gateway to be the VRRP address that floats between multiple routers. Then, I will set a prohibit route for the covering /24, which means the machine will send an ICMP unreachable (rather than discarding the packets), which can be useful later. Next, I’ll add .1 as an IPv4 address onto loopback, after which the machine will start replying to ICMP packets there with icmp-echo rather than dst-unreach. To make sure routing is always symmetric, I’ll add an ip rule which is a classifier that matches packets based on their source address, and then diverts these to an alternate routing table, which has only one entry: send via .253 (which is eunetworks-2).

Let me show this in action:

pim@hvn0-nlams2:~$ dig +short -x 94.142.240.254
eunetworks-gateway-100.router.nl.coloclue.net.
pim@hvn0-nlams2:~$ dig +short -x 94.142.240.253
bond0-100.eunetworks-2.router.nl.coloclue.net.
pim@hvn0-nlams2:~$ dig +short -x 94.142.240.252
bond0-100.eunetworks-3.router.nl.coloclue.net.

pim@hvn0-nlams2:~$ ip -4 nei | grep '94.142.240.25[234]'
94.142.240.252 dev coloclue lladdr 64:9d:99:b1:31:db REACHABLE
94.142.240.253 dev coloclue lladdr 64:9d:99:b1:31:af REACHABLE
94.142.240.254 dev coloclue lladdr 64:9d:99:b1:31:db REACHABLE

In the output above, I can see that eunetworks-2 (94.142.240.253) has MAC address 64:9d:99:b1:31:af, and that eunetworks-3 (94.142.240.252) has MAC address 64:9d:99:b1:31:db. My default gateway, handled by VRRP, is at .254 and it’s using the second MAC address, so I know that eunetworks-3 is primary, and will handle my egress traffic.

Verifying symmetric routing of the beacon

A quick demonstration to show the symmetric routing case, I can tcpdump and see that my “usual” egress traffic will be sent to the MAC address of the VRRP primary (which I showed to be eunetworks-3 above), while traffic coming from 185.52.227.0/24 ought to be sent to the MAC address of eunetworks-2 due to the ip rule and alternate routing table 10:

pim@hvn0-nlams2:~$ sudo tcpdump -eni coloclue host 194.1.163.93 and icmp
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on coloclue, link-type EN10MB (Ethernet), snapshot length 262144 bytes
10:02:17.193844 64:9d:99:b1:31:af > 6e:fa:52:d0:c1:ff, ethertype IPv4 (0x0800), length 98:
    194.1.163.93 > 94.142.240.71: ICMP echo request, id 16287, seq 1, length 64
10:02:17.193882 6e:fa:52:d0:c1:ff > 64:9d:99:b1:31:db, ethertype IPv4 (0x0800), length 98:
    94.142.240.71 > 194.1.163.93: ICMP echo reply, id 16287, seq 1, length 64

10:02:19.276657 64:9d:99:b1:31:af > 6e:fa:52:d0:c1:ff, ethertype IPv4 (0x0800), length 98:
    194.1.163.93 > 185.52.227.1: ICMP echo request, id 6646, seq 1, length 64
10:02:19.276694 6e:fa:52:d0:c1:ff > 64:9d:99:b1:31:af, ethertype IPv4 (0x0800), length 98:
    185.52.227.1 > 194.1.163.93: ICMP echo reply, id 6646, seq 1, length 64

It takes a keen eye to spot the difference here the first packet (which is going to the main IPv4 address 94.142.240.71), is returned via MAC address 64:9d:99:b1:31:db (the VRRP default gateway), but the second one (going to the beacon 185.52.227.1) is returned via MAC address 64:9d:99:b1:31:af.

I’ve now ensured that traffic to and from 185.52.227.1 will always traverse through the DUT (eunetworks-2 with MAC 64:9d:99:b1:31:af). Very elegant :-)

Installing VPP

I’ve written about this before, the general spiel is just following my previous article (I’m often very glad to read back my own articles as they serve as pretty good documentation to my forgetful chipmunk-sized brain!), so here, I’ll only recap what’s already written in [vpp-7]:

Build VPP with Linux Control Plane
Bring eunetworks-2 into maintenance mode, so we can safely tinker with it
Start services like ssh, snmp, keepalived and bird in a new dataplane namespace
Start VPP and give the LCP interface names the same as their original
Slowly introduce the router: OSPF, OSPFv3, iBGP, members-bgp, eBGP, in that order
Re-enable keepalived and let the machine forward traffic
Stare at the latency graphs

1. BUILD: For the first step, the build is straight forward, and yields a VPP instance based on vpp-ext-deps_23.06-1 at version 23.06-rc0~71-g182d2b466, which contains my [LCPng] plugin. I then copy the packages to the router. The router has an E-2286G CPU @ 4.00GHz with 6 cores and 6 hyperthreads. There’s a really handy tool called likwid-topology that can show how the L1, L2 and L3 cache lines up with respect to CPU cores. Here I learn that CPU (0+6) and (1+7) share L1 and L2 cache – so I can conclude that 0-5 are CPU cores which share a hyperthread with 6-11 respectively.

I also see that L3 cache is shared across all of the cores+hyperthreads, which is normal. I decide to give CPUs 0,1 and their hyperthread 6,7 to Linux for general purpose scheduling, and I want to block the remaining CPUs and their hyperthreads to dedicated to VPP. So the kernel is rebooted with isolcpus=2-5,8-11.

2. DRAIN: In the mean time, Rogier prepares the drain, which is two step process. First he marks all the BGP sessions as graceful_shutdown: True, and waits for the traffic to die down. Then, he marks the machine as maintenance_mode: True which will make Kees set OSPF cost to 65535 and avoid attracting or sending traffic through this machine. After he submits these, we are free to tinker with the router, as it will not affect any Coloclue members. Rogier also ensures we will have the hand on this little machine in Amsterdam, by preparing an IPMI serial-over-lan connection and KVM.

3. PREPARE: Starting an ssh and snmpd in the dataplane is the most important part. This way, we will be able to scrape the machine using SNMP just as-if it were a Linux native router. And of course we will want to be able to log in to the router. I start with these two services, the only small note is that, because I want to run two copies (one in the default namespace and one additional one in the dataplane namespace), I’ll want to tweak the startup flags (pid file, config file, etc) a little bit:

领英推荐

Fundamentals of Border Gateway Protocol (BGP) - Part 1

## in snmpd-dataplane.service
ExecStart=/sbin/ip netns exec dataplane /usr/sbin/snmpd -LOw -u Debian-snmp \
  -g vpp -I -smux,mteTrigger,mteTriggerConf -f -p /run/snmpd-dataplane.pid \
  -C -c /etc/snmp/snmpd-dataplane.conf

## in ssh-dataplane.service
ExecStart=/usr/sbin/ip netns exec dataplane /usr/sbin/sshd \
  -oPidFile=/run/sshd-dataplane.pid -D $SSHD_OPTS

4. LAUNCH: Now what’s left for us to do is switch from our SSH session to an IPMI serial-over-lan session so that we can safely transition to the VPP world. Rogier and I log in and share a tmux session, after which I bring down all ethernet links, remove VLAN sub-interfaces and the LACP BondEthernet, leaving only the main physical interfaces. I then set link down on them, and restart VPP – which will take all DPDK eligble interfaces that are link admin-down, and then let the magic happen:

root@eunetworks-2:~# vppctl show int
              Name               Idx    State  MTU (L3/IP4/IP6/MPLS)  Counter   Count
GigabitEthernet5/0/0              5     down         9000/0/0/0
GigabitEthernet6/0/0              6     down         9000/0/0/0
TenGigabitEthernet1/0/0           1     down         9000/0/0/0
TenGigabitEthernet1/0/1           2     down         9000/0/0/0
TenGigabitEthernet1/0/2           3     down         9000/0/0/0
TenGigabitEthernet1/0/3           4     down         9000/0/0/0

Dope! One way to trick the rest of the machine into thinking it hasn’t changed, is to recreate these interfaces in the dataplane network namespace using their original interface names (eg. enp1s0f3 for AMS-IX, and bond0 for the LACP signaled BondEthernet that we’ll create. Rogier prepared an excellent vppcfg config file:

loopbacks:
  loop0:
    description: 'eunetworks-2.router.nl.coloclue.net'
    lcp: 'loop0'
    mtu: 9216
    addresses: [ 94.142.247.3/32, 2a02:898:0:300::3/128 ]
        
bondethernets:
  BondEthernet0:
    description: 'Core: MLAG member switches'
    interfaces: [ TenGigabitEthernet1/0/0, TenGigabitEthernet1/0/1 ]
    mode: 'lacp'
    load-balance: 'l34'
    mac: '64:9d:99:b1:31:af'
        
interfaces:
  GigabitEthernet5/0/0:
    description: "igb 0000:05:00.0 eno1 # FiberRing"
    lcp: 'eno1'
    mtu: 9216
    sub-interfaces:
      205:
        description: "Peering: Arelion"
        lcp: 'eno1.205'
        addresses: [ 62.115.144.33/31, 2001:2000:3080:ebc::2/126 ]
        mtu: 1500
      992:
        description: "Transit: FiberRing"
        lcp: 'eno1.992'
        addresses: [ 87.255.32.130/30, 2a00:ec8::102/126 ]
        mtu: 1500
        
  GigabitEthernet6/0/0:
    description: "igb 0000:06:00.0 eno2 # Free"
    lcp: 'eno2'
    mtu: 9216
    state: down
        
  TenGigabitEthernet1/0/0:
    description: "i40e 0000:01:00.0 enp1s0f0 (bond-member)"
    mtu: 9216
        
  TenGigabitEthernet1/0/1:
    description: "i40e 0000:01:00.1 enp1s0f1 (bond-member)"
    mtu: 9216

  TenGigabitEthernet1/0/2:
    description: 'Core: link between eunetworks-2 and eunetworks-3'
    lcp: 'enp1s0f2'
    addresses: [ 94.142.247.246/31, 2a02:898:0:301::/127 ]
    mtu: 9214

  TenGigabitEthernet1/0/3:
    description: "i40e 0000:01:00.3 enp1s0f3  # AMS-IX"
    lcp: 'enp1s0f3'
    mtu: 9216
    sub-interfaces:
      501:
        description: "Peering: AMS-IX"
        lcp: 'enp1s0f3.501'
        addresses: [ 80.249.211.161/21, 2001:7f8:1::a500:8283:1/64 ]
        mtu: 1500
      511:
        description: "Peering: NBIP-NaWas via AMS-IX"
        lcp: 'enp1s0f3.511'
        addresses: [ 194.62.128.38/24, 2001:67c:608::f200:8283:1/64 ]
        mtu: 1500

  BondEthernet0:
    lcp: 'bond0'
    mtu: 9216
    sub-interfaces:
      100:
        description: "Cust: Members"
        lcp: 'bond0.100'
        mtu: 1500
        addresses: [ 94.142.240.253/24, 2a02:898:0:20::e2/64 ]
      101:
        description: "Core: Powerbars"
        lcp: 'bond0.101'
        mtu: 1500
        addresses: [ 172.28.3.253/24 ]
      105:
        description: "Cust: Members (no strict uRPF filtering)"
        lcp: 'bond0.105'
        mtu: 1500
        addresses: [ 185.52.225.14/28, 2a02:898:0:21::e2/64 ]
      130:
        description: "Core: Link between eunetworks-2 and dcg-1"
        lcp: 'bond0.130'
        mtu: 1500
        addresses: [ 94.142.247.242/31, 2a02:898:0:301::14/127 ]
      2502:
        description: "Transit: Fusix Networks"
        lcp: 'bond0.2502'
        mtu: 1500
        addresses: [ 37.139.140.27/31, 2a00:a7c0:e20b:104::2/126 ]

We take this configuration and pre-generate a suitable VPP config, which exposes two little bugs in vppcfg:

Rogier had used captial letters in his IPv6 addresses (ie. 2001:2000:3080:0EBC::2), while the dataplane reports lower case (ie. 2001:2000:3080:ebc::2), which consistently yield a diff that’s not there. I make a note to fix that.
When I create the initial --novpp config, there’s a bug in vppcfg where I incorrectly reference a dataplane object which I haven’t initialized (because with --novpp the tool will not contact the dataplane at all. That one was easy to fix, which I did in [this commit]).

After that small detour, I can now proceed to configure the dataplane by offering the resulting VPP commands, like so:

root@eunetworks-2:~# vppcfg plan --novpp -c /etc/vpp/vppcfg.yaml \
                                  -o /etc/vpp/config/vppcfg.vpp
[INFO    ] root.main: Loading configfile /etc/vpp/vppcfg.yaml
[INFO    ] vppcfg.config.valid_config: Configuration validated successfully
[INFO    ] root.main: Configuration is valid
[INFO    ] vppcfg.reconciler.write: Wrote 84 lines to /etc/vpp/config/vppcfg.vpp
[INFO    ] root.main: Planning succeeded

root@eunetworks-2:~# vppctl exec /etc/vpp/config/vppcfg.vpp

5. UNDRAIN: The VPP dataplane comes to life, only to immediately hang. Whoops! What follows is a 90 minute forray into the innards of VPP (and Bird) which I haven’t yet fully understood, but will definitely want to learn more about (future article, anyone?) – but the TL/DR of our investigation is that if an IPv6 address is added to a loopback device, and an OSPFv3 (IPv6) stub area is created on it, as is common for IPv4 and IPv6 loopback addresses in OSPF, then the dataplane immediately hangs on the controlplane, but does continue to forward traffic.

However, we also find a workaround, which is to put the IPv6 loopback address on a physical interface instead of a loopback interface. Then, we observe a perfectly functioning dataplane, which has a working BondEthernet with LACP signalling:

root@eunetworks-2:~# vppctl show bond details
BondEthernet0
  mode: lacp
  load balance: l34
  number of active members: 2
    TenGigabitEthernet1/0/1
    TenGigabitEthernet1/0/0
  number of members: 2
    TenGigabitEthernet1/0/0
    TenGigabitEthernet1/0/1
  device instance: 0
  interface id: 0
  sw_if_index: 8
  hw_if_index: 8

root@eunetworks-2:~# vppctl show lacp
                                                        actor state                      partner state                   
interface name            sw_if_index  bond interface   exp/def/dis/col/syn/agg/tim/act  exp/def/dis/col/syn/agg/tim/act
TenGigabitEthernet1/0/0   1            BondEthernet0      0   0   1   1   1   1   1   1    0   0   1   1   1   1   0   1
  LAG ID: [(ffff,64-9d-99-b1-31-af,0008,00ff,0001), (8000,02-1c-73-0f-8b-bc,0015,8000,8015)]
  RX-state: CURRENT, TX-state: TRANSMIT, MUX-state: COLLECTING_DISTRIBUTING, PTX-state: PERIODIC_TX
TenGigabitEthernet1/0/1   2            BondEthernet0      0   0   1   1   1   1   1   1    0   0   1   1   1   1   0   1
  LAG ID: [(ffff,64-9d-99-b1-31-af,0008,00ff,0002), (8000,02-1c-73-0f-8b-bc,0015,8000,0015)]
  RX-state: CURRENT, TX-state: TRANSMIT, MUX-state: COLLECTING_DISTRIBUTING, PTX-state: PERIODIC_TX

6. WRAP UP: After doing a bit of standard issue ping / ping6 and show err and show log, things are looking good. Rogier and I are now ready to slowly introduce the router: we first turn on OSPF and OSPFv3, see adjacencies and BFD turn up. We make a note that enp1s0f2 (which is now a LIP in the dataplane) does not have BFD while it does have OSPF, and the explanation for this is that bond0 is connected to a switch, while enp1s0f2 is directly connected to its peer via a cross connect cable, so if it fails, it’ll be able to use link-state to quickly reconverge, while the ethernet link may still be up on bond0 if something along the transport path were to fail, so BFD is the better choice there. Smart thinking, Coloclue!

root@eunetworks-2:~# birdc6 show ospf nei ospf1     
BIRD 1.6.8 ready.         
ospf1:
Router ID       Pri          State      DTime   Interface  Router IP   
94.142.247.1      1     Full/PtP        00:33   bond0.130  fe80::669d:99ff:feb1:394b              
94.142.247.6      1     Full/PtP        00:31   enp1s0f2   fe80::669d:99ff:feb1:31d8              

root@eunetworks-2:~# birdc show bfd ses
BIRD 1.6.8 ready.
bfd1:
IP address                Interface  State      Since       Interval  Timeout
94.142.247.243            bond0.130  Up         2023-02-24 15:56:29    0.100    0.500

We are then ready to undrain iBGP and eBGP to members, transit and peering sessions. Rogier swiftly takes care of business, and the router finds its spot in the DFZ just a few minutes later:

root@eunetworks-2:~# birdc show route count
BIRD 1.6.8 ready.
6239493 of 6239493 routes for 907650 networks

root@eunetworks-2:~# birdc6 show route count
BIRD 1.6.8 ready.
1152345 of 1152345 routes for 169987 networks

root@eunetworks-2:~# vppctl show ip fib sum
ipv4-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none
locks:[adjacency:1, default-route:1, lcp-rt:1, ]
    Prefix length         Count     
                   0               1
                   4               2
                   8              16
                   9              13
                  10              38
                  11             103
                  12             299
                  13             577
                  14            1214
                  15            2093
                  16           13477
                  17            8250
                  18           13824
                  19           24990
                  20           43089
                  21           51191
                  22          109106    
                  23           97073    
                  24          542106
                  27               3
                  28              13
                  29              32
                  30              36    
                  31              41    
                  32             788

root@eunetworks-2:~# vppctl show ip6 fib sum
ipv6-VRF:0, fib_index:0, flow hash:[src dst sport dport proto flowlabel ] epoch:0 flags:none
locks:[adjacency:1, default-route:1, lcp-rt:1, ]
    Prefix length         Count     
         128               863      
         127                4       
         126                4           
         125                1           
         120                2       
         64                22       
         60                17       
         52                 2       
         49                 2           
         48               80069         
         47               3535      
         46               3411      
         45               1726      
         44               14909     
         43               1041      
         42               2529      
         41                932      
         40               14126     
         39               1459      
         38               1654      
         37                988      
         36               6640      
         35               1374      
         34               3419      
         33               3707      
         32               22819     
         31                294      
         30                589      
         29               4373      
         28                196      
         27                20       
         26                15       
         25                 8       
         24                30       
         23                 7       
         22                 7          
         21                 3          
         20                15       
         19                 1       
         10                 1       
          0                 1

One thing that I really appreciate is how … normal … this machine looks, with no interfaces in the default namespace, but after switching to the dataplane network namespace using nsenter, there they are and they look (unsurprisingly, because we configured them that way), identical to what was running before, except now all goverend by VPP instead of the Linux kernel:

root@eunetworks-2:~# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP>

root@eunetworks-2:~# nsenter --net=/var/run/netns/dataplane
root@eunetworks-2:~# ip -br l
lo               UNKNOWN        00:00:00:00:00:00 <LOOPBACK,UP,LOWER_UP> 
eno1             UP             ac:1f:6b:e0:b1:0c <BROADCAST,MULTICAST,UP,LOWER_UP> 
eno2             DOWN           ac:1f:6b:e0:b1:0d <BROADCAST,MULTICAST> 
enp1s0f2         UP             64:9d:99:b1:31:ad <BROADCAST,MULTICAST,UP,LOWER_UP> 
enp1s0f3         UP             64:9d:99:b1:31:ac <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0            UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 
loop0            UP             de:ad:00:00:00:00 <BROADCAST,MULTICAST,UP,LOWER_UP> 
eno1.205@eno1    UP             ac:1f:6b:e0:b1:0c <BROADCAST,MULTICAST,UP,LOWER_UP> 
eno1.992@eno1    UP             ac:1f:6b:e0:b1:0c <BROADCAST,MULTICAST,UP,LOWER_UP> 
enp1s0f3.501@enp1s0f3 UP        64:9d:99:b1:31:ac <BROADCAST,MULTICAST,UP,LOWER_UP> 
enp1s0f3.511@enp1s0f3 UP        64:9d:99:b1:31:ac <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0.100@bond0  UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0.101@bond0  UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0.105@bond0  UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0.130@bond0  UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 
bond0.2502@bond0 UP             64:9d:99:b1:31:af <BROADCAST,MULTICAST,UP,LOWER_UP> 

root@eunetworks-2:~# ip -br a
lo               UNKNOWN        127.0.0.1/8 ::1/128 
eno1             UP             fe80::ae1f:6bff:fee0:b10c/64 
eno2             DOWN           
enp1s0f2         UP             94.142.247.246/31 2a02:898:0:300::3/128 2a02:898:0:301::/127 fe80::669d:99ff:feb1:31ad/64 
enp1s0f3         UP             fe80::669d:99ff:feb1:31ac/64 
bond0            UP             fe80::669d:99ff:feb1:31af/64 
loop0            UP             94.142.247.3/32 fe80::dcad:ff:fe00:0/64 
eno1.205@eno1    UP             62.115.144.33/31 2001:2000:3080:ebc::2/126 fe80::ae1f:6bff:fee0:b10c/64 
eno1.992@eno1    UP             87.255.32.130/30 2a00:ec8::102/126 fe80::ae1f:6bff:fee0:b10c/64 
enp1s0f3.501@enp1s0f3 UP        80.249.211.161/21 2001:7f8:1::a500:8283:1/64 fe80::669d:99ff:feb1:31ac/64 
enp1s0f3.511@enp1s0f3 UP        194.62.128.38/24 2001:67c:608::f200:8283:1/64 fe80::669d:99ff:feb1:31ac/64 
bond0.100@bond0  UP             94.142.240.253/24 2a02:898:0:20::e2/64 fe80::669d:99ff:feb1:31af/64 
bond0.101@bond0  UP             172.28.3.253/24 fe80::669d:99ff:feb1:31af/64 
bond0.105@bond0  UP             185.52.225.14/28 2a02:898:0:21::e2/64 fe80::669d:99ff:feb1:31af/64 
bond0.130@bond0  UP             94.142.247.242/31 2a02:898:0:301::14/127 fe80::669d:99ff:feb1:31af/64 
bond0.2502@bond0 UP             37.139.140.27/31 2a00:a7c0:e20b:104::2/126 fe80::669d:99ff:feb1:31af/64

Of course, VPP handles all the traffic through the machine, and the only traffic that Linux will see is that which is destined to the controlplane (eg, to one of the IPv4 or IPv6 addresses or multicast/broadcast groups that they are participating in), so things like tcpdump or SNMP won’t really work.

However, due to my [vpp-snmp-agent], which is feeding as an AgentX behind an snmpd that in turn is running in the dataplane namespace, SNMP scrapes work as they did before, albeit with a few different interface names.

6. Earlier, I had failed over keepalived and stopped the service. This way, the peer router on eunetworks-3 would pick up all outbound traffic to the virtual IPv4 and IPv6 for our users’ default gateway. Because we’re mainly interested in non-intrusively measuring the BGP beacon (which is forced to always go through this machine), and we know some of our members use BGP and take a preference over this router because it’s connected to AMS-IX, we make a decision to leave keepalived turned off for now.

But, traffic is flowing, and in fact a little bit more throughput, possibly because traffic flows faster when there’s not 5% packet loss on certain egress paths? I don’t know but OK, moving along!

Results

Clearly VPP is a winner in this scenario. If you recall the traceroute from before the operation, the latency was good up until nlams0.ipng.ch, after which loss occured and variance was very high. Rogier and I let the VPP instance run overnight, and started this traceroute after our maintenance was concluded:

                                       My traceroute  [v0.94]
squanchy.ipng.ch (194.1.163.90) -> 185.52.227.1                           2023-02-25T09:48:46+0100
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                          Packets               Pings
 Host                                                   Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. chbtl0.ipng.ch                                       0.0% 51796    0.6   0.2   0.1   1.7   0.2
 2. chrma0.ipng.ch                                       0.0% 51796    1.6   1.0   0.9   5.5   1.2
 3. defra0.ipng.ch                                       0.0% 51796    7.0   6.5   6.4  27.7   1.9
 4. nlams0.ipng.ch                                       0.0% 51796   12.7  12.6  12.5  43.8   3.9
 5. bond0-105.eunetworks-2.router.nl.coloclue.net        0.0% 51796   13.3  13.0  12.8 138.9  11.1
 6. 185.52.227.1                                         0.0% 51796   13.6  12.7  12.3  46.6   8.3

This mtr shows clear network weather with absolutely no packets dropped from Brüttisellen (near Zurich, Switzerland) all the way to the BGP beacon running in EUNetworks in Amsterdam. Considering I’ve been running VPP for a few years now, including writing the code necessary to plumb the dataplane interfaces through to Linux so that a higher order control plane (such as Bird, or FRR) can manipulate them, I am reasonably bullish, but I do hope to convert others.

This computer now forwards packets like a boss, its packet loss is →

Looking at the local situation, from a hypervisor running at IPng Networks in Equinix AM3 via FrysIX through VPP and into the dataplane of the Coloclue router eunetworks-2 , shows quite reasonable throughput as well:

root@eunetworks-2:~# traceroute hvn0.nlams3.ipng.ch
traceroute to 46.20.243.179 (46.20.243.179), 30 hops max, 60 byte packets
 1  enp1s0f3.eunetworks-3.router.nl.coloclue.net (94.142.247.247)  0.087 ms  0.078 ms  0.071 ms
 2  frys-ix.ip-max.net (185.1.203.135)  1.288 ms  1.432 ms  1.479 ms
 3  hvn0.nlams3.ipng.ch (46.20.243.179)  0.524 ms  0.534 ms  0.531 ms

root@eunetworks-2:~# iperf3 -c 46.20.243.179 -P 10
Connecting to host 46.20.243.179, port 5201
...
[SUM]   0.00-10.00  sec  6.70 GBytes  5.76 Gbits/sec    192             sender
[SUM]   0.00-10.03  sec  6.58 GBytes  5.64 Gbits/sec                  receiver

root@eunetworks-2:~# iperf3 -c 46.20.243.179 -P 10 -R
Connecting to host 46.20.243.179, port 5201
Reverse mode, remote host 46.20.243.179 is sending      
...
[SUM]   0.00-10.03  sec  6.07 GBytes  5.20 Gbits/sec  54623             sender
[SUM]   0.00-10.00  sec  6.03 GBytes  5.18 Gbits/sec                  receiver

And the smokepings look just plain gorgeous:

The screenshots above are smokeping from (left) a machine at AS8283 Coloclue (in Amsterdam, the Netherlands), and from (right) a machine at AS8298 IPng (in Brüttisellen, Switzerland), both are showing no packetloss and clearly improved performance in end to end latency. Super!

What’s next

The performance of the one router we upgraded definitely improved, no question about that. But there’s a couple of things that I think we still need to do, so Rogier and I rolled back the change to the previous situation and kernel based routing.

We didn’t migrate keepalived, although IPng runs this in our DDLN [colocation] site, so I’m pretty confident that it will work.
Kees and Ansible at Coloclue will need a few careful changes, to facilitate ongoing automation, think of dataplane and controlplane firewalls, sysctls (uRPF et al), fastnetmon, and so on will need a meaningful overhaul.

As an important side note, VPP is not well enough understood at Coloclue - rolling this out further risks making me a single point of failure in the networking committee, and I’m not comfortable taking that responsibility. I recommend that Coloclue network committee members gain experience with VPP, DPDK, vppcfg and the other ecosystem tools, and that at least the bird6 OSPF issue and possible IPv6 NS/RA issue are understood, before making the jump to the VPP world.

álvaro Rodríguez

C | C++ | Software Engineer

1 年

And I still have lots of jitter and latencies using VPP. So powerful tho

1 次回应

Pascal Bovet

Infrastructure @ Alchemy | ex-Google, ex-Robinhood

1 年

It's been a while since I've used or seen rrd graphs. Underrated tool. So easy, yet so powerful.

2 次回应

Liz Fong-Jones

Field CTO @ honeycomb.io

1 年

Have you had a chance to try SDN on a Honeycomb (no relation to my employer) LX2K with the arm64 instruction set? 16 cores of Cortex A72, no hyperthreading :)

2 次回应

查看更多评论

要查看或添加评论，请登录

Pim van Pelt的更多文章

FreeIX Remote - Part 2

2025年1月21日

FreeIX Remote - Part 2

Introduction A few months ago, I wrote about [an idea] to help boost the value of small Internet Exchange Points…

8 条评论
FreeIX Remote - Part 1

2024年12月24日

FreeIX Remote - Part 1

Introduction Tier1 and aspiring Tier2 providers interconnect only in large metropolitan areas, due to commercial…

10 条评论
VPP with sFlow - Part 2

2024年11月14日

VPP with sFlow - Part 2

Introduction Last month, I picked up a project together with Neil McKee of [inMon], the care takers of [sFlow]: an…

6 条评论
VPP with sFlow - Part 1

2024年10月31日

VPP with sFlow - Part 1

Introduction In January of 2023, an uncomfortably long time ago at this point, an acquaintance of mine called Ciprian…

10 条评论
Case Study: From Jekyll to Hugo

2024年10月17日

Case Study: From Jekyll to Hugo

Introduction In the before-days, I had a very modest personal website running on [ipng.nl] and [ipng.

12 条评论
Case Study: NAT64 in AS8298

2024年9月26日

Case Study: NAT64 in AS8298

Introduction IPng’s network is built up in two main layers, (1) an MPLS transport layer, which is disconnected from the…

32 条评论
VPP on FreeBSD (part 2)

2024年9月9日

VPP on FreeBSD (part 2)

About this series Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its…

6 条评论
VPP on FreeBSD (part 1)

2024年8月25日

VPP on FreeBSD (part 1)

About this series Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its…

8 条评论
Case Study: Selfhosted e-mail

2024年8月14日

Case Study: Selfhosted e-mail

Intro I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I’ve…

18 条评论
VPP and OSPFv3: without IPv4 addresses!

2024年7月30日

VPP and OSPFv3: without IPv4 addresses!

Introduction When I first built IPng Networks AS8298, I decided to use OSPF as an IPv4 and IPv6 internal gateway…

23 条评论

See all articles

VPP at Coloclue, part 2

Pim van Pelt

Distinguished Software Engineer at Google

The Problem

Isolating a Device Under Test

Setting up the beacon

Verifying symmetric routing of the beacon

Installing VPP

领英推荐

Results

What’s next

Pim van Pelt的更多文章

社区洞察

其他会员也浏览了

BGP Slow Peer.

BGP Confederation

BGP Messages.

Physical Clock Synchronization — Clock Series

BGP MTU Discovery.

BGP Best External.

OSPF Basic Concepts - Part 2

Route Redistribution - Part 4

Subnet Mask - Explained

Traceroute Work and Example's of using traceroute command

The Problem

Isolating a Device Under Test

Setting up the beacon

Verifying symmetric routing of the beacon

Installing VPP

领英推荐

Results

What’s next

Pim van Pelt的更多文章

FreeIX Remote - Part 2

FreeIX Remote - Part 1

VPP with sFlow - Part 2

VPP with sFlow - Part 1

Case Study: From Jekyll to Hugo

Case Study: NAT64 in AS8298

VPP on FreeBSD (part 2)

VPP on FreeBSD (part 1)

Case Study: Selfhosted e-mail

VPP and OSPFv3: without IPv4 addresses!

社区洞察

其他会员也浏览了

BGP Slow Peer.

BGP Confederation

BGP Messages.

Physical Clock Synchronization — Clock Series

BGP MTU Discovery.

BGP Best External.

OSPF Basic Concepts - Part 2

Route Redistribution - Part 4

Subnet Mask - Explained

Traceroute Work and Example's of using traceroute command