VPP Linux CP - Part2

VPP Linux CP - Part2

About this series

Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its performance and versatility. For those of us who have used Cisco IOS/XR devices, like the classic ASR (aggregation services router), VPP will look and feel quite familiar as many of the approaches are shared between the two. One thing notably missing, is the higher level control plane, that is to say: there is no OSPF or ISIS, BGP, LDP and the like. This series of posts details my work on a VPP plugin which is called the Linux Control Plane, or LCP for short, which creates Linux network devices that mirror their VPP dataplane counterpart. IPv4 and IPv6 traffic, and associated protocols like ARP and IPv6 Neighbor Discovery can now be handled by Linux, while the heavy lifting of packet forwarding is done by the VPP dataplane. Or, said another way: this plugin will allow Linux to use VPP as a software ASIC for fast forwarding, filtering, NAT, and so on, while keeping control of the interface state (links, addresses and routes) itself. When the plugin is completed, running software like FRR or Bird on top of VPP and achieving >100Mpps and >100Gbps forwarding rates will be well in reach!

In this second post, let’s make the plugin a bit more useful by making it copy forward state changes to interfaces in VPP, into their Linux CP counterparts.

My test setup

I’m using the same setup from the previous post. The goal of this post is to show what code needed to be written and which changes needed to be made to the plugin, in order to propagate changes to VPP interfaces to the Linux TAP devices.

Startingpoint

The linux-cp plugin that ships with VPP 21.06, even with my changes is still only able to create LIP devices. It’s not very user friendly to have to apply state changes meticulously on both sides, but it can be done:

vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
vppctl set interface state TenGigabitEthernet3/0/0 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0
vppctl set interface ip address TenGigabitEthernet3/0/0 10.0.1.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0 2001:db8:0:1::1/64
ip link set e0 up 
ip link set e0 mtu 9000
ip addr add 10.0.1.1/30 dev e0
ip addr add 2001:db8:0:1::1/64 dev e0
        

In this snippet, we can see that after creating the LIP, thus conjuring up the unconfigured e0 interface in Linux, I changed the VPP interface in three ways:

  1. I set the state of the VPP interface to ‘up’
  2. I set the MTU of the VPP interface to 9000
  3. I add an IPv4 and IPv6 address to the interface

Because state does not (yet) propagate, I have to make those changes as well on the Linux side with the subsequent ip commands.

Configuration

I can imagine that operators want to have more control and facilitate the Linux and VPP changes themselves. This is why I’ll start off by adding a variable called lcp_sync, along with a startup configuration keyword and a CLI setter. This allows me to turn the whole sync behavior on and off, for example in startup.conf:

linux-cp {
  default netns dataplane
  lcp-sync
}
        

And in the CLI:

DBGvpp# show lcp
lcp default netns dataplane
lcp lcp-sync on

DBGvpp# lcp lcp-sync off
DBGvpp# show lcp
lcp default netns dataplane
lcp lcp-sync off
        

The prep work for the rest of the interface syncer starts with this [commit], and for the rest of this blog post, the behavior will be in the ‘on’ position.

Change interface: state

Immediately, I find a dissonance between VPP and Linux: When Linux sets a parent interface down, all children go to state M-DOWN. When Linux sets a parent interface up, all of its children automatically go to state UP and LOWER_UP. To illustrate:

ip link set enp66s0f1 down
ip link add link enp66s0f1 name foo type vlan id 1234
ip link set foo down
## Both interfaces are down, which makes sense because I set them both down
ip link | grep enp66s0f1
9: enp66s0f1: <BROADCAST,MULTICAST> mtu 9000 qdisc mq state DOWN mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,M-DOWN> mtu 9000 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    
ip link set enp66s0f1 up
ip link | grep enp66s0f1
## Both interfaces are up, which doesn't make sense because I only changed one of them!
9: enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP mode DEFAULT group default qlen 1000
61: foo@enp66s0f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        

VPP does not work this way. In VPP, the admin state of each interface is individually controllable, so it’s possible to bring up the parent while leaving the sub-interface in the state it was. I did notice that you can’t bring up a sub-interface if its parent is down, which I found counterintuitive, but that’s neither here nor there.

All of this is to say that we have to be careful when copying state forward, because as this [commit] shows, issuing set int state ... up on an interface, won’t touch its sub-interfaces in VPP, but the subsequent netlink message to bring the LIP for that interface up, will update the children, thus desynchronising Linux and VPP: Linux will have interface and all its sub-interfaces up unconditionally; VPP will have the interface up and its sub-interfaces in whatever state they were before.

To address this, a second [commit] was needed. I’m not too sure I want to keep this behavior, but for now, it results in an intuitive end-state, which is that all interfaces states are exactly the same between Linux and VPP.

DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# lcp create TenGigabitEthernet3/0/0 host-if e0
DBGvpp# lcp create TenGigabitEthernet3/0/0.10 host-if e0.10
DBGvpp# set int state TenGigabitEthernet3/0/0 up
## Correct: parent is up, sub-int is not
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0.10 up
## Correct: both interfaces up
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP mode DEFAULT group default qlen 1000

DBGvpp# set int state TenGigabitEthernet3/0/0 down 
DBGvpp# set int state TenGigabitEthernet3/0/0.10 down
DBGvpp# set int state TenGigabitEthernet3/0/0 up     
## Correct: only the parent is up
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST> mtu 9000 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000
        

Change interface: MTU

Finally, a straight forward [commit], or so I thought. When the MTU changes in VPP (with set interface mtu packet N <int>), there is callback that can be registered which copies this into the LIP. I did notice a specific corner case: In VPP, a sub-interface can have a larger MTU than its parent. In Linux, this cannot happen, so the following remains problematic:

DBGvpp# create sub TenGigabitEthernet3/0/0 10
DBGvpp# set int mtu packet 1500 TenGigabitEthernet3/0/0  
DBGvpp# set int mtu packet 9000 TenGigabitEthernet3/0/0.10
## Incorrect: sub-int has larger MTU than parent, valid in VPP, not in Linux
694: e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UNKNOWN mode DEFAULT group default qlen 1000
695: e0.10@e0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        

I think the best way to ensure this works is to clamp the sub-int to a maximum MTU of that of its parent, and revert the user’s request to change the VPP sub-int to anything higher than that, perhaps logging an error explaining why. This means two things:

  1. Any change in VPP of a child MTU to larger than its parent, must be reverted.
  2. Any change in VPP of a parent MTU should ensure all children are clamped to at most that.

I addressed the issue in this [commit].

Change interface: IP Addresses

There are three scenarios in which IP addresses will need to be copied from VPP into the companion Linux devices:

  1. set interface ip address adds an IPv4 or IPv6 address. This is handled by lcp_itf_ip[46]_add_del_interface_addr() which is a callback installed in lcp_itf_pair_init() at plugin initialization time.
  2. set interface ip address del removes addresses. This is also handled by lcp_itf_ip[46]_add_del_interface_addr() but curiously there is no upstream vnet_netlink_del_ip[46]_addr() so I had to write them inline here. I will try to get them upstreamed, as they appear to be obvious companions in vnet/device/netlink.h.
  3. This one is easy to overlook, but upon LIP creation, it could be that there are already L3 addresses present on the VPP interface. If so, set them in the LIP with lcp_itf_set_interface_addr().

This means with this [commit], at any time a new LIP is created, the IPv4 and IPv6 address on the VPP interface are fully copied over by the third change, while at runtime, new addresses can be set/removed as well by the first and second change.

Further work

I noticed that Bird periodically scans the Linux interface list and (re)learns information from them. I have a suspicion that such a feature might be useful in the VPP plugin as well: I can imagine a periodical process that walks over the LIP interface list, and compares what it finds in Linux with what is configured in VPP. What’s not entirely clear to me is which direction should ‘trump’, that is, should the Linux state be forced into VPP, or should the VPP state be forced into Linux? I don’t yet have a good feeling of the answer, so I’ll punt on that for now.

Results

After applying the configuration to VPP (in Appendix), here’s the results:

pim@hippo:~/src/lcpng$ ip ro
default via 194.1.163.65 dev enp6s0 proto static 
10.0.1.0/30 dev e0 proto kernel scope link src 10.0.1.1 
10.0.2.0/30 dev e0.1234 proto kernel scope link src 10.0.2.1 
10.0.3.0/30 dev e0.1235 proto kernel scope link src 10.0.3.1 
10.0.4.0/30 dev e0.1236 proto kernel scope link src 10.0.4.1 
10.0.5.0/30 dev e0.1237 proto kernel scope link src 10.0.5.1 
194.1.163.64/27 dev enp6s0 proto kernel scope link src 194.1.163.88 

pim@hippo:~/src/lcpng$ fping 10.0.1.2 10.0.2.2 10.0.3.2 10.0.4.2 10.0.5.2 
10.0.1.2 is alive
10.0.2.2 is alive
10.0.3.2 is alive
10.0.4.2 is alive
10.0.5.2 is alive

pim@hippo:~/src/lcpng$ fping6 2001:db8:0:1::2 2001:db8:0:2::2 \
  2001:db8:0:3::2 2001:db8:0:4::2 2001:db8:0:5::2
2001:db8:0:1::2 is alive
2001:db8:0:2::2 is alive
2001:db8:0:3::2 is alive
2001:db8:0:4::2 is alive
2001:db8:0:5::2 is alive

        

In case you were wondering: my previous post ended in the same huzzah moment. It did.

The difference is that now the VPP configuration is much shorter! Comparing the Appendix from this post with my first post, after all of this work I no longer have to manually copy the configuration (like link states, MTU changes, IP addresses) from VPP into Linux, instead the plugin does all of this work for me, and I can configure both sides entirely with vppctl commands!

Bonus screencast!

Humor me as I take the code out for a 5 minute spin :-)

Credits

I’d like to make clear that the Linux CP plugin is a great collaboration between several great folks and that my work stands on their shoulders. I’ve had a little bit of help along the way from Neale Ranns, Matthew Smith and Jon Loeliger, and I’d like to thank them for their work!

Appendix

Ubuntu config

# Untagged interface
ip addr add 10.0.1.2/30 dev enp66s0f0
ip addr add 2001:db8:0:1::2/64 dev enp66s0f0
ip link set enp66s0f0 up mtu 9000

# Single 802.1q tag 1234
ip link add link enp66s0f0 name enp66s0f0.q type vlan id 1234
ip link set enp66s0f0.q up mtu 9000
ip addr add 10.0.2.2/30 dev enp66s0f0.q
ip addr add 2001:db8:0:2::2/64 dev enp66s0f0.q

# Double 802.1q tag 1234 inner-tag 1000
ip link add link enp66s0f0.q name enp66s0f0.qinq type vlan id 1000
ip link set enp66s0f0.qinq up mtu 9000
ip addr add 10.0.3.3/30 dev enp66s0f0.qinq
ip addr add 2001:db8:0:3::2/64 dev enp66s0f0.qinq

# Single 802.1ad tag 2345
ip link add link enp66s0f0 name enp66s0f0.ad type vlan id 2345 proto 802.1ad
ip link set enp66s0f0.ad up mtu 9000
ip addr add 10.0.4.2/30 dev enp66s0f0.ad
ip addr add 2001:db8:0:4::2/64 dev enp66s0f0.ad

# Double 802.1ad tag 2345 inner-tag 1000
ip link add link enp66s0f0.ad name enp66s0f0.qinad type vlan id 1000 proto 802.1q
ip link set enp66s0f0.qinad up mtu 9000
ip addr add 10.0.5.2/30 dev enp66s0f0.qinad
ip addr add 2001:db8:0:5::2/64 dev enp66s0f0.qinad
        

VPP config

## Look mom, no `ip` commands!! :-)
vppctl set interface state TenGigabitEthernet3/0/0 up
vppctl lcp create TenGigabitEthernet3/0/0 host-if e0
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0
vppctl set interface ip address TenGigabitEthernet3/0/0 10.0.1.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0 2001:db8:0:1::1/64

vppctl create sub TenGigabitEthernet3/0/0 1234
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1234
vppctl lcp create TenGigabitEthernet3/0/0.1234 host-if e0.1234
vppctl set interface state TenGigabitEthernet3/0/0.1234 up
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 10.0.2.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1234 2001:db8:0:2::1/64

vppctl create sub TenGigabitEthernet3/0/0 1235 dot1q 1234 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1235 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1235
vppctl lcp create TenGigabitEthernet3/0/0.1235 host-if e0.1235
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 10.0.3.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1235 2001:db8:0:3::1/64

vppctl create sub TenGigabitEthernet3/0/0 1236 dot1ad 2345 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1236 up
vppctl lcp create TenGigabitEthernet3/0/0.1236 host-if e0.1236
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1236
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 10.0.4.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1236 2001:db8:0:4::1/64

vppctl create sub TenGigabitEthernet3/0/0 1237 dot1ad 2345 inner-dot1q 1000 exact-match
vppctl set interface state TenGigabitEthernet3/0/0.1237 up
vppctl set interface mtu packet 9000 TenGigabitEthernet3/0/0.1237
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 10.0.5.1/30
vppctl set interface ip address TenGigabitEthernet3/0/0.1237 2001:db8:0:5::1/64
vppctl lcp create TenGigabitEthernet3/0/0.1237 host-if e0.1237
        

Final note

You may have noticed that the [commit] links are all to empty anchors on this page. They’re not anchors, but git commits in my private working copy. I want to wait until my previous work is reviewed and submitted before piling on more changes. Feel free to contact vpp-dev@ for more information in the mean time :-)

Frédéric MASSON

Au service de votre performance, 365IT propose des solutions sur mesure et met de la valeur dans vos projets IT ! | Cloud privé/hybride | Cybersécurité | Infogérance | ???? #Cybersécurité #Risques #Menaces

11 个月

Bravo Fred ! Heureux d’être client IP-Max !

Jeff F.

Systems and Network Consultant

12 个月

Been following linux-cp and Pim for a while and even lab'd it up last year. It was the missing component and 'super glue' for VPP folks had been waiting for... but especially if you've been using VPP and DPDK for while. Excellent work.

Steve McKee

.?l?.?l?. Account Executive

12 个月

The VPP with Babel (with Bird ;)) article was really interesting. The prospect of using IPv6 routing for IPv4 traffic (to save on transit addresses, etc) sounds brilliant. It's what got me interested in SRv6 (the idea of building out on IPv6 to carry IPv4 networks), and still getting my head around it ;). Will definitely research Babel more !. Many thanks.

Piotr Smoleń

Network Engineer

12 个月

You are doing a great job!

要查看或添加评论,请登录

Pim van Pelt的更多文章

  • FreeIX Remote - Part 2

    FreeIX Remote - Part 2

    Introduction A few months ago, I wrote about [an idea] to help boost the value of small Internet Exchange Points…

    8 条评论
  • FreeIX Remote - Part 1

    FreeIX Remote - Part 1

    Introduction Tier1 and aspiring Tier2 providers interconnect only in large metropolitan areas, due to commercial…

    10 条评论
  • VPP with sFlow - Part 2

    VPP with sFlow - Part 2

    Introduction Last month, I picked up a project together with Neil McKee of [inMon], the care takers of [sFlow]: an…

    6 条评论
  • VPP with sFlow - Part 1

    VPP with sFlow - Part 1

    Introduction In January of 2023, an uncomfortably long time ago at this point, an acquaintance of mine called Ciprian…

    10 条评论
  • Case Study: From Jekyll to Hugo

    Case Study: From Jekyll to Hugo

    Introduction In the before-days, I had a very modest personal website running on [ipng.nl] and [ipng.

    12 条评论
  • Case Study: NAT64 in AS8298

    Case Study: NAT64 in AS8298

    Introduction IPng’s network is built up in two main layers, (1) an MPLS transport layer, which is disconnected from the…

    32 条评论
  • VPP on FreeBSD (part 2)

    VPP on FreeBSD (part 2)

    About this series Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its…

    6 条评论
  • VPP on FreeBSD (part 1)

    VPP on FreeBSD (part 1)

    About this series Ever since I first saw VPP - the Vector Packet Processor - I have been deeply impressed with its…

    8 条评论
  • Case Study: Selfhosted e-mail

    Case Study: Selfhosted e-mail

    Intro I have seen companies achieve great successes in the space of consumer internet and entertainment industry. I’ve…

    18 条评论
  • VPP and OSPFv3: without IPv4 addresses!

    VPP and OSPFv3: without IPv4 addresses!

    Introduction When I first built IPng Networks AS8298, I decided to use OSPF as an IPv4 and IPv6 internal gateway…

    23 条评论

社区洞察

其他会员也浏览了