Optimistic Concurrency in Kubernetes

Optimistic Concurrency in Kubernetes


Concurrency is a tough problem to solve. There are a lot of ways to tackle concurrency problems, and yet we have to trade lot of things just to hit that sweet balance between concurrency and serialization. Specially, in distributed systems like Kubernetes, this gets even tough.


A Brief Primer on Kubernetes

Let's imagine you have an Kubernetes cluster up and running. Everything is running fine. Suddenly, there arises a need to deploy a pod within the cluster. You fire up your terminal and run kubectl apply like any other DevOps will. The command will ensure you have a pod up and running in the cluster. This approach is known as declarative resource management, where you specify the desired state to the Kubernetes service, and it takes care of the necessary actions to achieve that state.


But how does Kubernetes proceed with that?


Let's trace how your request is handled by the Kubernetes. First, your request to deploy the pod is sent to the forward facing API server along with your authorization credentials.

Now, the API server will simply write your record to the etcd database.

/registry/pods/<namespace>/<pod_name>        

Kubernetes will use this format for saving the pod definition in the etcd database. After it has been saved to the etcd database, the different controllers currently active in the cluster will receive event about the addition of a pod in the etcd database.

Kubelet and Scheduler controller will be most interested in this event. Based on the resource constraints, affinity/anti-affinity rules,node selectors and custom scheduling preferences, the scheduler will assign a node to run that pod. After, the node has been assigned, the job of scheduler is done for this request.

Kubelet running on the scheduled node will receive this event and run the container in the node. It's also responsible for synchronizing the status of the pod with the Kubernetes API server and etcd database.


Where is the concurrency problem?

Kubernetes uses a special identifier called "resourceVersion" to manage the latest version of the pod manifests. The concurrency problem arises when two controllers decide to update the manifest at the same time.

Let's say two controllers "A" and "B" received version 0 of the manifest. Now, they'll both update the manifest according to their business logic and try to increment the resource version by 1. Now whose request should succeed? Both are trying to set the manifest resourceVersion to 1? If it approves both change, then the second request will wipe the change performed by first request.


Optimistic Concurrency Model


Kubernetes follows the optimistic concurrency model. In this model, there's no concept of locking and upfront. In the context of the etcd database used by Kubernetes, the optimistic concurrency model works as follows:

  1. Read Phase: When a client wants to read a resource, it fetches a copy of the resource from the database. The database does not acquire any locks during this phase.
  2. Modify Phase: The client makes modifications to the resource based on the fetched copy. When the client wants to update the resource, it sends the modified copy back to the database.
  3. Check and Update Phase: During this phase, the database checks if the resource has been updated by other concurrent operations since the client read it. It compares the version or timestamp of the resource with the one provided by the client. If the versions match, it assumes no conflicts occurred and applies the update. Otherwise, if the versions don't match, it detects a conflict.
  4. Conflict Resolution: In the event of a conflict, the optimistic concurrency control model typically employs a strategy to resolve it. This can involve rolling back the client's changes, retrying the operation, or applying conflict resolution logic specific to the use case.


Now, the first controller will succeed in making the change, but the second controller will fail because the resource version it's trying to create already exists.


What will controller B do?

Controller "B" will simply have to retry it's change by fetching the latest manifest from the API server and again perform it's operations.


These issues arise a lot of time in distributed systems like Kubernetes, that's why every controllers have logic to retry on failure multiple times.


Sugam Adhikari

Software Engineer @ Verisk | Research Assistant @ University of Illinois Urbana-Champaign | AWS Cloud Practitioner

1 年

No, Mahesh! I don't know.

要查看或添加评论,请登录

Mahesh Regmi的更多文章

  • Bit Flags for Multiple Options

    Bit Flags for Multiple Options

    I was writing a CLI tool recently and had to make a configuration block mergeable, similar to opening files in C. I…

    1 条评论

社区洞察

其他会员也浏览了