登录查看更多内容

Container Network Interface (CNI) - A Summary

Ratnadeep Bhattacharya

Distributed Cloud Infrastructure Engineer, GDC Air-gapped Storage Everywhere

发布日期: 2020年9月16日

This is a topic that has been turning over in the back of my mind for a while. In short, sometime last year, I discovered that container networking tends to add more overhead than what I had expected. And since then I have been meaning to find out more on this. The below video is, to-date, the best description of the process that I found. It does requires one to understand a little about networking and Linux namespaces but overall a surprisingly limpid elucidation of a deeply convoluted topic. This article is, frankly, most of the contents that I took away from the hour long video.

Software Switches in Linux

Creating a software switch in Linux is simple:

Create a couple of interfaces
Create a Linux bridge
Attach the interfaces to the bridge

These can be simply achieve by the `ip addr` and `brctl` commands. However, the issue is that containers/dockers and the bridge reside in different namespaces. So we need a way to connect an interface in a container to a Linux bridge and send data across.

Virtual Interfaces

Virtual interfaces can be created with `sudo ip link add <veth-1> type veth peer name <veth-2>`. Virtual interfaces are always created in pairs and data transmitted to one peer immediately received on the other peer. This is a natural solution to the problem discussed above: we can assign <veth-1> to the bridge while assigning <veth-2> to the container. So now, we can send data across namespaces but we don't yet know how to assign interfaces across namespaces.

Changing the Defaults

We first need to get the process for the docker container: `cont_pid=$(sudo docker inspect -f '{.State.Pid}' container2)`. Here our container is called container2.

Next, create our network namespace directory: `mkdir -p /var/run/netns`

Finally, Docker namespaces are not typically visible to Linux, so we need to fix that: `sudo ln -sfT /proc/$(cont_pid)/ns/net /var/run/netns/container2`

Of course, we can use `ip netns exec` command to set up the network inside the container. For example: `sudo ip netns exec container2 ip link set veth-2 up`

Connecting to the Internet

The containers, thus connected to a bridge, connect to the Internet through NAT (Network Address Translation). For example, MASQUERADE rules that redirect all traffic from all interfaces in the bridge, with source IP addresses from within the bridge's subnet, and not intended for another interface connected on the same bridge, to the default route. Another example is when host ports are mapped to a container port, DNAT rules are set up.

Multi-Host Deployments/Container Clusters

We can connect hosts with a L2 VLAN that span the docker0 bridge across hosts. But then we need to:

Ensure that subnets across the docker0 are not the same to prevent IP collision
Maintain L3 routes across hosts

This is a fairly impossible task when the number of hosts grow! This is where container networking solutions come into picture. Two of the more popular choices are Flannel and Calico!

All networking solutions mainly solve the above two problems using a variety of backends.

Flannel

Flannel is a relatively simple solution that installs an agent in every host and maintains a central data store. Flannel's recommended backends are VXLAN and host-gw.

VXLAN

The Linux kernel's VXLAN driver is used to connect the hosts over UDP. The central store is used to figure out where to route the UDP packet to reach the intended VXLAN.

host-gw

This is a simpler option if all hosts are connected through the same subnet. Flannel can use the direct connection to solve the above two problems.

L3 routes are static in Flannel.

Calico

Calico, on the other hand, is quite a bit more sophisticated. It installs Bird-BGP on each host along with the agent. The hosts are BGP peers which helps Calico maintain routes dynamically.

Registering network solutions with Docker

Docker exposes a plug-in interface for network solutions to use. If a networking solution registers through this plug-in then Docker hands over the responsibility for mutli-host networking to that solution. Typically, the solutions agent registers with the plug-in interface on each host.

Note: Network solutions need not be software only.