Adding Readiness & Liveness to Kubernetes Workloads (Kibana)

Adding Readiness & Liveness to Kubernetes Workloads (Kibana)

In our Kubernetes environment, we have Traefik load balancers routing the user traffic to Kibana application servers. We noticed that one of the Kibana servers was hung / unresponsive but the Traefik load balancers were still continuing to route the user traffic, resulting users to get server error.


So we decided to setup Readiness and Liveness probes for Kibana application container pods.


Now what is Readiness and Liveness probe in Kubernetes?


Both readiness and liveness are used to monitor the status of the pod, but the action taken is different between the two.?


If readiness is configured, when pod becomes unresponsive, the traffic to the unresponsive pod is removed from Service load balancers (without which the traffic would continue to get routed to the pod running the service but won't cater the service to users). But the service defunct pod continues to run as it is but user impact is avoided.


If liveness is configured, the unresponsive pod gets rebooted so that it can service again (useful for problems that gets resolved on reboot).


Both of these are achieved by the periodic probes sent by kubelet. The configuration includes setting up the following parameters:

failure threshold, initial delay seconds, period seconds, success threshold, timeout seconds, command to be executed (used with "if" and "fi" conditions).


The periodSeconds field specifies that the kubelet should perform a liveness probe every x seconds. The initialDelaySeconds field tells the kubelet that it should wait x seconds before performing the first probe. To perform a probe, the kubelet executes the command (written in exec:) in the target container. In our configuration, if the command succeeds, it returns 200, and the kubelet considers the container to be alive and healthy. If the command returns a non-200 value, the kubelet kills the container and restarts it.


The TCP liveness probe can also be configured to connect on specific ports like 8080 to determine liveness. Both readiness and liveness can be used on same container (first to remove the unresponsive pod from Service load balancers and then to reboot the unresponsive pod).


If there is a pod that usually takes longer time to start, startup probe can be setup. Once the pod is up, liveness probe takes over and start monitoring. If startup is unsuccessful, the container will be killed after defined time based on pod restart policy.


Now after enabling this configuration, If Kibana goes into hung state, it will be automatically removed from Service load balancers. If Kibana starts responding again before reaching the threshold, it again gets added to service load balancers; if not container gets rebooted on reaching liveness threshold.


Team: Rajesh Mehra ; Eshwar Hudge and Rajaraman Sathyamurthy

要查看或添加评论,请登录

Rajaraman Sathyamurthy的更多文章

社区洞察

其他会员也浏览了