casinos online real money.Enjoy Free 888+200 Daily Legal Bonus

we’ll explore how to stabilize your Kubernetes microservices with the correct resource settings to avoid issues like resource over-committing and node crashes.

Pre-required knowledge:

You should have?Grafana?dashboard monitoring enabled on your pods.
Understanding the structure of the Kubernetes deployment file.

How did everything start?

It was a day when I was on call when my phone rang with the support team on the other end.

Is there a problem? I asked.

Yes, we see that the services are not available, and we need your help.

In this way, I started my journey to understand why our system suffers from unavailable pods.

Ok, I took a long breath and went into this incident.

So what do I need to do first?

To identify which service was experiencing issues, I checked our Grafana dashboard to see which service was unavailable .

so I shouted out loud in the developer’s room — “is there someone deploying something today?”

There was a silence in the room, indicating that no one had been deploying anything.

So what was happening?

To further investigate the issue, I accessed the dashboard that monitors the CPU and memory usage of the affected service.

Upon checking the CPU and memory usage of the pods, I was surprised to find that there were no spikes or abnormalities.

In an effort to identify the root cause of the issue, I next checked the dashboard that monitors the load on the nodes in the cluster

Upon checking the node load, I noticed a spike that indicated that the node was unable to provide the resources that the pods were requesting and?leading it to crash. This suggested that the issue was related to resource constraints on the node.

To effectively troubleshoot issues with unavailable pods in a Kubernetes cluster, it can be helpful to have a basic understanding of how Kubernetes handles resource allocation.

In Kubernetes, you can define the resources that a pod requires in the deployment file using the resources block. This block consists of two fields: requests and limits.

The requests?field specifies the minimum amount of resources that a pod needs to function properly. When scheduling a pod on a node, Kubernetes will only place the pod on a node if it has sufficient resources available to fulfill the resource requests defined in the deployment file.

The limits?field specifies the maximum amount of resources that a pod is allowed to use. If a pod attempts to exceed its resource limits, it may be terminated or throttled to prevent it from consuming too many resources.

What is the problem?

By defining low resource requests and high resource limits, the pods may have been able to request more resources than were available on the node, leading to resource over-committing and potentially causing the service to become unavailable.

Why did the Node crash?

Imagine that you have a Kubernetes cluster with 1 node, the node has 4000 MI of CPU and 8000MB of memory.

Now, imagine that you have a deployment that consists of 4 pods, each with the following resource limits and requests defined in the deployment file:

CPU requests: 800 MI
CPU limits: 2000 MI
Memory requests: 1500MB
Memory limits: 2100MB

Now, let’s say that all 4 pods are scheduled onto the same node.

This means that the node would have to provide 3200 MI (800 MI* 4 pods) of CPU and 6000MB (1500MB * 4 pods ) of memory to the pods.

However, the node only has a capacity of 4000 MI of CPU and 8000 MB of memory, which means that if one of the pods uses its full 2000 MI CPU limit, the node could become overloaded and unable to provide the resources that the pods need to run effectively, it may result in an outage for the service that the pods provide.

In this case, Kubernetes will attempt to reschedule the pods onto other nodes in the cluster. However, if all of the nodes in the cluster are already at capacity and there are no available resources to accommodate the pods, it may take some time for Kubernetes to create a new node and reschedule the pods onto it.

In some cases, it may take hours for Kubernetes to create a new node and reschedule the pods, which can lead to extended?outages for the service

How does Kubernetes handle over-commitment?

Kubernetes is designed to schedule pods onto nodes based on the resource limits and requests defined in the deployment file. If a pod has a resource request that is higher than the capacity of the node, Kubernetes will not schedule the pod onto the node.

However, Kubernetes does not actively monitor the resource usage of pods and nodes and automatically adjusts resource limits and requests to avoid over-commitment. Instead, it is up to the administrator to set appropriate resource limits and requests for the pods based on the needs of the pods and the capacity of the nodes.

If a pod becomes unavailable due to resource over-commitment, Kubernetes will not automatically remove the pod from the node. Instead, it is up to the administrator to identify the cause of the issue and take steps to resolve it, such as adjusting the resource limits and requests for the pod or adding more nodes to the cluster to increase capacity.

How can we fix it?

To avoid this issue, it is important to?carefully consider the resource needs of your pods and set appropriate limits that do not exceed the capacity of the nodes in your cluster.

Set the same value for both the memory request and limit to avoid overcommitting memory on the node.
Set a CPU request value, but avoid using a CPU limit. This will allow the container to guarantee its CPU usage using the request value, but if there is additional CPU available on the node, it can use more. To read more about it:?https://home.robusta.dev/blog/stop-using-cpu-limits

To address the issue described above, there are two possible solutions:

Reduce the resource limits of the pods to values that are within the capacity of the nodes. For example, you could set the CPU limits to 800 MI and the memory limits to 1500MB. This would bring the total resource limits down to:

CPU limits: 4 * 800 MI = 3200 MI
Memory limits: 4 * 1500MB = 6000MB

These resource limits would be within the capacity of the nodes, as they only require 4000 MI of CPU and 8000MB of memory.

2. Increase the resource requests values of the pods to be equal to the limits values

CPU requests : 4 * 2000 MI = 8000 MI
Memory requests: 4 * 2100MB = 8400MB

From this calculation, we see that this solution will require more nodes for our system — how many nodes?

Why is it important to subtract with 1000MI?

By subtracting 1000MI, you are accounting for the resources needed for the node processes, which will help to ensure that there are sufficient resources available for the pods to function correctly.

Our system will be required to scale up to 3 node

In both of the solutions, we will get a result that pods would be able to run successfully on the nodes without causing them to crash.

There are a few ways you can estimate the appropriate values for resource requests and limits for your pods:

Use the current and historical data from your Grafana dashboard to understand the type and peak resource usage of your pods. This can help you determine the average and maximum resource needs of your pods and set appropriate limits and requests accordingly.
Use the “kubectl top pod” command to view the current resource usage of your pods. This can help identify any potential resource constraints in real time and making adjustments as needed.

Before making any changes to the resource requests and limits for your pods:

1. It is important to consider the potential impact on the number of nodes in your cluster. As I mentioned, increasing resource requests can lead to the need for more nodes in the cluster to accommodate the additional resource demands.

To calculate the number of nodes you will need after making changes to the resource requests and limits, you can use the formula described above.

2. Once you have calculated the number of nodes needed, it is important to check your IP range to ensure that there is sufficient space for new nodes to scale up as needed. You may also need to update your node group to accommodate the additional nodes.

By carefully considering the resource needs of your pods and making appropriate changes to the resource requests and limits, you can help to ensure that your cluster can meet the resource demands of your workloads and avoid issues like resource over-committing and node crashes.

Summary:

Monitoring the resource usage of your pods and nodes is also critical to identifying any potential issues early on and making adjustments as needed. Tools like Grafana and Prometheus can be incredibly helpful in this regard, as they allow you to view real-time and historical data on resource usage and identify any trends or anomalies that may be causing problems.

Overall, I believe that setting appropriate resource limits and requests is an essential aspect of effectively managing and operating a Kubernetes cluster in production. By taking the time to understand the resource needs of your microservices and setting appropriate limits and requests, you can help to ensure that your applications are stable, reliable, and performant.

#kubernetes #pods

Optimize Kubernetes MicroServices

?? Saral Saxena ??????

?11K+ Followers | Linkedin Top Voice || Associate Director || 14+ Years in Java, Microservices, Kafka, Spring Boot, Cloud Technologies (AWS, GCP) | Agile , K8s ,DevOps & CI/CD Expert

Pre-required knowledge:

How did everything start?

So what was happening?

What is the problem?

Why did the Node crash?

领英推荐

How does Kubernetes handle over-commitment?

How can we fix it?

To address the issue described above, there are two possible solutions:

Summary:

更多精彩文章

社区洞察

其他会员也浏览了

Kubernetes 1.29: Navigating Through Deprecations and Removals

Microservices Anti-Patterns

Kubernetes vs Docker Swarm : Comparing Container Orchestration Tools

Docker Architecture - Detailed Explanation

Microservices

From Monolith to Microservices: A Journey Towards Secure, Scalable Applications

CI/CD for Microservices Architectures

Containerized MicroServices Pros and Cons.

Essential Microservices Best Practices: Design, Deploy, and Scale Efficiently

Why Docker is Perfect for Microservices

Pre-required knowledge:

How did everything start?

So what was happening?

What is the problem?

Why did the Node crash?

领英推荐

How does Kubernetes handle over-commitment?

How can we fix it?

To address the issue described above, there are two possible solutions:

Summary:

Memory Optimization Techniques for Spring Boot Applications with Practical Coding Strategies

2024年10月27日

Designing CI/CD Pipeline

2024年9月28日

Calculate CPU for containers in k8s dynamically

2024年9月27日

Downside of the Executor Service with context to thread local

2024年9月22日

How to calculate availability of the system?

2024年9月19日

What is RAG - Retrieval-Augmented Generation

2024年9月16日

Bursty traffic handling

2024年9月16日

SAGA pattern for distributed transactions

2024年9月15日

Why 1==1 is true but 128==128 is false in Java??

2024年9月14日

社区洞察

其他会员也浏览了

Kubernetes 1.29: Navigating Through Deprecations and Removals

Microservices Anti-Patterns

Kubernetes vs Docker Swarm : Comparing Container Orchestration Tools

Docker Architecture - Detailed Explanation

Microservices

From Monolith to Microservices: A Journey Towards Secure, Scalable Applications

CI/CD for Microservices Architectures

Containerized MicroServices Pros and Cons.

Essential Microservices Best Practices: Design, Deploy, and Scale Efficiently

Why Docker is Perfect for Microservices