Container Network Interface (CNI) - A Summary


No alt text provided for this image

This is a topic that has been turning over in the back of my mind for a while. In short, sometime last year, I discovered that container networking tends to add more overhead than what I had expected. And since then I have been meaning to find out more on this. The below video is, to-date, the best description of the process that I found. It does requires one to understand a little about networking and Linux namespaces but overall a surprisingly limpid elucidation of a deeply convoluted topic. This article is, frankly, most of the contents that I took away from the hour long video.

Software Switches in Linux

Creating a software switch in Linux is simple:

  • Create a couple of interfaces
  • Create a Linux bridge
  • Attach the interfaces to the bridge

These can be simply achieve by the `ip addr` and `brctl` commands. However, the issue is that containers/dockers and the bridge reside in different namespaces. So we need a way to connect an interface in a container to a Linux bridge and send data across.

Virtual Interfaces

Virtual interfaces can be created with `sudo ip link add <veth-1> type veth peer name <veth-2>`. Virtual interfaces are always created in pairs and data transmitted to one peer immediately received on the other peer. This is a natural solution to the problem discussed above: we can assign <veth-1> to the bridge while assigning <veth-2> to the container. So now, we can send data across namespaces but we don't yet know how to assign interfaces across namespaces.

Changing the Defaults

We first need to get the process for the docker container: `cont_pid=$(sudo docker inspect -f '{.State.Pid}' container2)`. Here our container is called container2.

Next, create our network namespace directory: `mkdir -p /var/run/netns`

Finally, Docker namespaces are not typically visible to Linux, so we need to fix that: `sudo ln -sfT /proc/$(cont_pid)/ns/net /var/run/netns/container2`

Of course, we can use `ip netns exec` command to set up the network inside the container. For example: `sudo ip netns exec container2 ip link set veth-2 up`

Connecting to the Internet

The containers, thus connected to a bridge, connect to the Internet through NAT (Network Address Translation). For example, MASQUERADE rules that redirect all traffic from all interfaces in the bridge, with source IP addresses from within the bridge's subnet, and not intended for another interface connected on the same bridge, to the default route. Another example is when host ports are mapped to a container port, DNAT rules are set up.

Multi-Host Deployments/Container Clusters

We can connect hosts with a L2 VLAN that span the docker0 bridge across hosts. But then we need to:

  • Ensure that subnets across the docker0 are not the same to prevent IP collision
  • Maintain L3 routes across hosts

This is a fairly impossible task when the number of hosts grow! This is where container networking solutions come into picture. Two of the more popular choices are Flannel and Calico!

All networking solutions mainly solve the above two problems using a variety of backends.

Flannel

Flannel is a relatively simple solution that installs an agent in every host and maintains a central data store. Flannel's recommended backends are VXLAN and host-gw.

VXLAN

The Linux kernel's VXLAN driver is used to connect the hosts over UDP. The central store is used to figure out where to route the UDP packet to reach the intended VXLAN.

host-gw

This is a simpler option if all hosts are connected through the same subnet. Flannel can use the direct connection to solve the above two problems.

L3 routes are static in Flannel.

Calico

Calico, on the other hand, is quite a bit more sophisticated. It installs Bird-BGP on each host along with the agent. The hosts are BGP peers which helps Calico maintain routes dynamically.

Registering network solutions with Docker

Docker exposes a plug-in interface for network solutions to use. If a networking solution registers through this plug-in then Docker hands over the responsibility for mutli-host networking to that solution. Typically, the solutions agent registers with the plug-in interface on each host.

Note: Network solutions need not be software only.

Container Network Interface (CNI)

Go run N containers on K hosts

Container orchestrators, like Kubernetes, are management software that know how to convert statements like this into a running cluster. Container orchestrators expose the CNI as the network plug-in standard. This is different from the Docker plug-in interface.

Network Policies

In the core docker approach, network policies are applied by modifying iptables in Linux. Moder CNI solutions are slowly taking over this responsibility as well.

Future Direction

Two interesting future research directions are high-density and federated container clusters.

  • For high density, here high density means a high container-to-host-ratio, clusters applying network policies for a significant number of containers takes its toll on performance. There is attempt underfoot to use Linux kernel's eBPF capabilities to perform network processing from the userspace, which is faster.
  • In case of federated clusters, CNI solutions probably have a huge role to play in the deployment of multi-cloud or hybrid clusters.

要查看或添加评论,请登录

Ratnadeep Bhattacharya的更多文章

社区洞察

其他会员也浏览了