What Is Pod Affinity and Anti-Affinity in Kubernetes

What Is Pod Affinity and Anti-Affinity in Kubernetes

Kubernetes is a distributed system that’s designed to scale replicas of your services across multiple physical environments. In many cases this works well out-of-the-box. The Kubernetes scheduler automatically places your Pods (container instances) onto Nodes (worker machines) that have enough resources to support them.

In this article, we’ll focus specifically on the “affinity” and “anti-affinity” concepts that give you granular control of scheduling. Affinities define rules that either must or should be met before a Pod can be allocated to a Node.

How Does Affinity Work?

  • Affinities are used to express Pod scheduling constraints that can match characteristics of candidate Nodes and the Pods that are already running on those Nodes.
  • A Pod that has an “affinity” to a given Node is more likely to be scheduled to it; conversely, an “anti-affinity” makes it less probable it’ll be scheduled.
  • The overall balance of these weights is used to determine the final placement of each Pod.

Types of Affinity Condition

There are currently two different kinds of affinity that you can define:

  • Node Affinity?– Used to constrain the Nodes that can receive a Pod by matching labels of those Nodes. Node Affinity can only be used to set positive affinities that attract Pods to the Node.
  • Inter-Pod Affinity?– Used to constrain the Nodes that can receive a Pod by matching labels?of the existing Pods already running on each of those Nodes. Inter-Pod Affinity can be either an attracting affinity or a repelling anti-affinity.

In the simplest possible example, a Pod that includes a Node Affinity condition of?label=value?will only be scheduled to Nodes with a?label=value?label. A Pod with the same condition but defined as an Inter-Pod Affinity will be scheduled to a Node that already hosts a?Pod?with a?label=value?label.

Setting Node Affinities

Node Affinity has two distinct sub-types:

  • requiredDuringSchedulingIgnoredDuringExecution?– This is the “hard” affinity matcher that?requires?the Node meet the constraints you define.
  • preferredDuringSchedulingIgnoredDuringExecution?– This is the “soft” matcher to express a preference that’s ignored when it can’t be fulfilled.

The?IgnoredDuringExecution?part of these verbose names makes it explicit that affinity is only considered while scheduling Pods. Once a Pod has made it onto a Node, affinity isn’t re-evaluated. Changes to the Node won’t cause a Pod eviction due to changed affinity values. A future Kubernetes release could add support for this behavior via the reserved?requiredDuringSchedulingRequiredDuringExecution?phrase.

Node affinities are attached to Pods via their?spec.affinity.nodeAffinity?manifest field:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
            - key: hardware-class
              operator: In
              values:
                - a
                - b
                - c
          - matchExpressions:
            - key: internal
              operator: Exists        

This manifest creates a hard affinity rule that schedules the Pod to a Node meeting the following criteria:

  • It has a?hardware-class?label with either?a,?b, or?c?as the value.
  • It has an?internal?label with any value.

You can attach additional conditions by repeating the?matchExpressions?clause. Supported operators for value comparisons are?In,?NotIn,?Exists,?DoesNotExist,?Gt?(greater than), and?Lt?(less than).

The?matchExpression?clauses grouped under a single?nodeSelectorTerms?clause are combined with a boolean?AND. They all need to match for a Pod to gain affinity to a particular Node. You can use multiple?nodeSelectorTerms?clauses too; these will be combined as a logical?OR?operation. You can easily assemble complex scheduling criteria by utilizing both of these structures.

“Soft” scheduling preferences are set up in a similar way. Use?nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution?instead of or as well as?requiredDuringSchedulingIgnoredDuringExecution?to configure these. Define each of your optional constraints as a?matchExpressions?clause within a?preference?field:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: hardware-class
            operator: In
            values:
              - a
              - b
              - c        

Preference-based rules have an additional field called?weight?that accepts an integer from 1 to 100. Each Node that matches a preference has its total affinity weight incremented by the set amount; the Node that ends up with the highest overall weight will be allocated the Pod.

Setting Inter-Pod Affinities

Inter-Pod Affinities work very similarly to Node Affinities but do have some important differences. The “hard” and “soft” modes are indicated using the same?requiredDuringSchedulingIgnoredDuringExecution?and?preferredDuringSchedulingIgnoredDuringExecution?fields. These should be nested under the?spec.affinity.podAffinity?or?spec.affinity.podAntiAffinity?fields depending on whether you want to increase or reduce the Pod’s affinity upon a successful match.

Here’s a simple example that demonstrates both affinity and anti-affinity:

apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
    - name: demo-container
    # ...
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: hardware-class
                operator: In
                values:
                  - a
                  - b
                  - c
          topologyKey: topology.kubernetes.io/zone
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 1
          podAffinityTerm:
            - labelSelector:
                matchExpressions:
                  - key: app-component
                    operator: In
                    values:
                      - background-worker
          topologyKey: topology.kubernetes.io/zone        

The format differs slightly from Node Affinity. Each?matchExpressions?constraint needs to be nested under a?labelSelector. For soft matches, this in turn should be located within a?podAffinityTerm. Pod affinities also offer a reduced set of comparison operators: you can use?In,?NotIn,?Exists?and?DoesNotExist.

Pod affinities need a?topologyKey?field. This is used to limit the overall set of Nodes that are considered eligible for scheduling, before the?matchExpressions?are evaluated. The rules above will schedule the Pod to a Node with the?topology.kubernetes.io/zone?label and an existing Pod with the?hardware-class?label set to?a,?b, or?c. Nodes that also have a Pod with the?app-component=background-worker?label will be given a reduced affinity.

Inter-Pod affinities are a powerful mechanism for controlling colocation of Pods. However they do have a significant impact on performance:?Kubernetes warns against?using them in clusters with more than a few hundred Nodes. Each new Pod scheduling request needs to check every other Pod on all the other Nodes to assess compatibility.

Other Scheduling Constraints

While we’ve focused on affinities in this article, Kubernetes provides other?scheduler constraint mechanisms?too. These are typically simpler but less automated approaches that work well for smaller clusters and deployments.

The most basic constraint is the?nodeSelector?field. It’s defined on Pods as a set of label key-value pairs that must exist on Nodes hosting the Pod:

apiVersion: v1
kind: Pod
metadata:
  name: demo
spec:
  containers:
    - name: demo
      # ...
  nodeSelector:
    hardware-class: a
    internal: true        

This manifest instructs Kubernetes to only schedule the Pod to Nodes with both the?hardware-class: a?and?internal: true?labels.

Node selection with the?nodeSelector?field is a good way to quickly scaffold static configuration based on long-lived attributes of your Nodes. The affinity system is much more flexible when you want to express complex rules and optional preferences.

Conclusion

Affinities and anti-affinities are used to set up versatile Pod scheduling constraints in Kubernetes. Compared to other options like?nodeSelector, affinities are complex but give you more ways to identify compatible Nodes.

要查看或添加评论,请登录

Abhishek Rana的更多文章

社区洞察

其他会员也浏览了