Networking for Kubernetes

Networking for Kubernetes

 Kubernetes (K8s) is an open-source container-orchestration system for automating deployment, scaling, and management of containerized applications. It was originally designed by Google and is now maintained by the Cloud Native Computing Foundation.

No alt text provided for this image


Kubernetes is quickly becoming the new standard for deploying and managing containers in the Hybrid-Cloud. Using the same orchestration on-premise and on the public cloud allows a high level of agility and ease of operations (Use the SAME API across bare metal and public clouds).

No alt text provided for this image


The building blocks of Kubernetes:

Node

A node is the smallest unit of computing element in Kubernetes. It is a representation of a single machine in a cluster. In most production systems, a node will likely be either a physical server or a virtual machine hosted on-premise or on the cloud.

Cluster

No alt text provided for this image


When deploying applications onto the cluster, it intelligently handles distributing work to the individual nodes. If any nodes are added or removed, the cluster will shift around workloads as necessary. It should not matter to the application, or the developer, which individual nodes are actually running the code.

Persistent Volume

No alt text provided for this image


Since applications running on the cluster are not guaranteed to run on a specific node, data cannot be saved to any arbitrary place in the file system. If an application tries to save data for later usage but is then relocated onto a new node, the data will no longer be where the application expects it to be. For this reason, the traditional local storage associated with each node is treated as a temporary cache to hold applications, but any data saved locally cannot be expected to persist.

To store data permanently, Kubernetes use Persistent Volumes. While the CPU and RAM resources of all nodes are effectively pooled and managed by the cluster, persistent file storage is not. Instead, local or cloud stores can be attached to the cluster as a Persistent Volume.

Container

No alt text provided for this image

Applications running on Kubernetes are packaged as Linux containers. Containers are a widely accepted standard, so there are already many pre-built images that can be deployed on Kubernetes.

Containerization allows the creation of self-contained Linux execution environments. Any application and all its dependencies can be bundled up into a single file. Containers allow powerful CI (continuous integration) and CD (continuous deployment) pipelines to be formed as each container can hold a specific part of an application. Containers are the underlying infrastructure for Microservices.

Microservices are a software development technique, an architectural style that structures an application as a collection of loosely coupled services. The benefit of decomposing an application into different smaller services is that it improves modularity. This makes the application easier to understand, develop, test, and deploy.

Pod

No alt text provided for this image


Kubernetes doesn’t run containers directly. Instead, it wraps one or more containers into a higher-level structure called a pod. Any containers in the same pod will share the same Node and local network. Containers can easily communicate with other containers in the same pod as though they were on the same machine while maintaining a degree of isolation from others.

Pods are used as the unit of replication in Kubernetes. If your application becomes too heavy and a single pod instance can’t carry the load, Kubernetes can be configured to deploy new replicas of your pod to the cluster as necessary. Even when not under heavy load, it is standard to have multiple copies of a pod running at any time in a production system to allow load balancing and failure resistance.


Networking

At its core, Kubernetes Networking has one important fundamental design philosophy:

Every Pod has a unique IP.

No alt text provided for this image


The Pod IP is shared by all the containers inside, and it’s routable from all the other Pods. A huge benefit of this IP-per-pod model is there are no IP or port collisions with the underlying host. There is no need to worry about what port the applications use.

With this in place, the only requirement Kubernetes has is that Pod IPs are routable/accessible from all the other pods, regardless of what node they’re on.

In the Kubernetes networking model, in order to reduce complexity and make app porting seamless, a few rules are enforced as fundamental requirements:

No alt text provided for this image

 

  • Containers can communicate with all other containers without NAT.
  • Nodes can communicate with all containers without NAT, and vice-versa.
  • The IP that a container sees itself as is the same IP that others see it as.

There is a vast amount of network implementations for Kubernetes. Among all these implementations Flannel and Calico are probably the most popular ones that are used as network plugins for the Container Network Interface (CNI). CNI, can be seen as the simplest possible interface between container runtimes and network implementations, with the goal of creating a generic plugin-based networking solution for containers.

Flannel 

Flannel can run using several encapsulation backends with VXLAN being the recommended one.

L2 connectivity is required between the Kubernetes nodes when using Flannel with VXLAN.

Due to this requirement the size of the fabric might be limited, if a pure L2 network is deployed, the number of Racks connected is limited to the number of ports on the Spine switches.

No alt text provided for this image


To overcome this issue, it is possible to deploy an L3 Fabric with VXLAN/EVPN on the leaf level. L2 connectivity will be provided to the nodes on top of a BGP routed fabric that can scale easily. VXLAN packets coming from the Nodes will be encapsulated into VXLAN Tunnels running between the leaf switches.

No alt text provided for this image

The Mellanox Spectrum ASIC provides huge value when it comes to VXLAN throughput, latency and scale. While most switches can support up to 128 remote VTEPs, meaning up to 128 racks in a single fabric. The Mellanox Spectrum ASIC supports up to 750 remote VTEPs allowing up to 750 Racks in a single fabric.

Calico

Calico is not really an overlay network but can be seen as a pure IP networking fabric (leveraging BGP) in Kubernetes clusters across the cloud.

No alt text provided for this image

A typical Calico deployment looks as followed:

No alt text provided for this image


Calico AS Design options:

No alt text provided for this image


In a Calico network, each endpoint is a route. Hardware networking platforms are constrained by the number of routes they can learn. This is usually in the range of 10,000’s or 100,000’s of routes. Route aggregation can help, but that is usually dependent on the capabilities of the scheduler used by the orchestration software (e.g. OpenStack).

 When choosing a Switch for your Kubernetes deployment make sure it has a routing table size which will allow a scale that will not limit your Kubernetes compute scale.

The Mellanox Spectrum ASIC provides a fully flexible table size which enables up to 176,000 IP route entries with Spectrum1 and up to 512,000 with Spectrum2, enabling the largest Kubernetes clusters which run by the biggest enterprises world-wide.

Routing stack persistency across physical network and Kubernetes

There are 2 common routing stacks used with Calico, Bird and FRR.

No alt text provided for this image

 When working with Cumulus Linux OS on the switch layer, you would probably want to use FRR as the routing stack on your nodes, leveraging BGP unnumbered.

If you are looking for a pure open sourced solution you should check out the Mellanox LinuxSwitch, which supports both FRR and BIRD as the routing stack.

Network Visibility challenges when working with Kubernetes

Containers are automatically spun up and destroyed as needed on any server in the cluster. Since the containers are located inside a host, they can be invisible to network engineers — never knowing where they are located or when they are created and destroyed.

Operating modern agile data centers is notoriously difficult with limited network visibility and changing traffic patterns.

No alt text provided for this image



 


Soumik Biswas

Azure Certified || PCNSE || Citrix CCE-N || CEHv10 || CCNP Certified Cloud & Network Security Specialist || Technical Lead at NTT Ltd.

4 年

What an awesome article bro... Kudos... Great work...

要查看或添加评论,请登录

Mohit Bhardwaj ?的更多文章

社区洞察

其他会员也浏览了