Kubernetes platform - at the fundamental level
"I see a lot of people having problems to understand how the Kubernetes platform works at the fundamental level, e.g. resiliency and behaviour. If you start thinking about Kubernetes as a fully event-driven system, there's answers to so many "Why"'s"
Think of the API server as an immutable (replicated) log (queue) and stream of events. Events are facts that can be causally related ("happened-before") or not related at all (then we say they happened "concurrently"). etcd is an implementation detail.
All controllers, e.g. the scheduler, deployment controller, endpoint controller, Kubelet, etc. can be understood as consumers and/or producers (consumers can be producers as well, and vice versa).
Consumers specify the objects (and optionally namespace) they want to receive events from the API server. This is called a "watch" in Kubernetes. Think of the combination of object+namespace as a dedicated (virtual) event queue the API server handles.
Consumers and producers don't know about each other as they're fully decoupled (by the log) and autonomous. This makes the whole system extremely scalable, robust and extensible (flexible).
Thus, by design, it's a fully asynchronous and eventual consistent platform. Information takes time to propagate from consumer(s) to producer(s). The picture above shows this where the HPA hasn't caught up with the metrics.
There's NO guarantee that the system will converge to the desired state (even if you got an "OK"/ACK from the control plane). Eg: kubectl scale <deploy> --replicas <n> (with cpu requests > cluster_capacity). You'll get a HTTP 200, even though there is no capacity left.
Another example for some unwanted "racy" conditions between controllers when scaling down services:
Handling Client Requests Properly with Kubernetes
Controllers are stateless. For efficiency and speed received events are placed in an in-memory buffer (typically also modelled as a queue) per controller. What if a controller crashes as it does not persist state?
This is not needed! The event-driven design will replay all (appropriate) events when the controller starts, also known as event-sourcing. This is also very useful as events are "at most once" delivered, i.e. could be lost during transmission.
Kubernetes is level-triggered. If an event gets lost during transmission (e.g. network issue), next time there's a sync you're going to receive the desired state
Kubernetes: Edge vs Level Triggered Logic
Reconciliation in case of the scheduler
Since information can get delivered more than once (e.g. after failure, resync, etc.) and controllers don't talk to each other directly, there's a potential of race conditions when state is to be changed (e.g. a write to same object from different controllers).
This is called optimistic concurrency and needs to be handled in the application layer (i.e. in each controller logic). Idempotency and compare and set (based on monotonically increasing resource versions) are patterns used here.
Event-driven design has many more advantages and can address a lot of problems in distributed systems (backpressure, queuing, retries, scale-out, etc.).
It's not a new idea. Relational databases work the same way (transaction log). The databases is said to be the (stale) cache of the immutable log :)
If you want to learn more about event-driven architecture, make sure to check out this video from Designing Events-first Microservices
A deep dive into Kubernetes API Server
Closing remark: this is not specific to computer science. Events are the reason for all being. Without events, there would be no change! Think about it, it's an applied and battle-tested design.
Private Health Insurance consultant
10 个月Sambit, thanks for sharing!