Issues in Kubernetes Pods post node reboot
Rajaraman Sathyamurthy
Associate Director & Senior Architect, Data Architecture
You may have automated patching and reboot scheduled for your VMs (as part of maintenance / patching window), using BigFix or similar tool. But if Kubernetes is running on those VMs, you may be facing issues with Kubernetes pods or nodes, which might get stuck or hung post reboot and services may not be up properly.
Sounds familiar?
VMs that are running Kubernetes are not supposed?to be rebooted this way.
Is it so? What's the right way?
Well, before abruptly rebooting the VM, the pods / services / workloads running on that node to be smoothly moved over to another node. You can achieve this by simply running the cordon command (to first stop access requests coming to that node) and drain command (to successfully move the workloads to another node that has resources).
Now you can perform the reboot of VM and once it is up, do not forget to uncordon it to allow traffic again. This way you won't have stuck/hung issues that's caused by abrupt reboot of VMs.
You can achieve this easily using Kured (Kubernetes Reboot Daemon), combining with simple shell-script automation. Hope this helps!