A Production Issue that's Hard to Find
A production issue.

A Production Issue that's Hard to Find

Regardless of where you are running your app (K8s & non-K8s environment), sometimes random people start to complain by saying: "Your app doesn't load!"

Naturally, these issues are overwhelming. And scary too. Because they impact the business.

What's interesting is that, your app loads fine on your device. On all of your devices. And your monitoring systems look quite normal. Except that your customer still has the issue.


NOTE: If you resonate with my work, please consider joining me on Youtube. I totally appreciate your support. Thank You.


Let's continue...

So obviously, you respond to your customer support by saying, "customer internet issue".

But customer responds back - "My other apps are working fine".

The production issue.
The production issue use case!

Now that's when the actual trouble starts.

In scenarios like these, in most cases, it is ISP's local DNS server issue or a caching problem. (I've seen it few times. In some of the apps I manage for my org!)

  1. DNS Issue: The local DNS server of the ISP could be down, triggering the issue.
  2. Outdated Cache: The local DNS server of ISP might have a wrong cached copy of your application's IP address.

If your application's IP address has recently changed, these cached entries take time to reflect. Most often, you are unaware of these changes.

Both these issues are absolutely out of your control. Finding them is hard as well.

How do you fix the problem? 2 Ways.

  1. Just switch the ISP. Use your mobile Wifi, for example.
  2. By simply changing the DNS to a global google DNS, many times it works.

And that's why my friend, thinking out of the box matters.

Hope you learned something today. The purpose of learning is growth, not grades. Thank you.

Btw, if you are interested in my work, consider checking out my Twitter and Substack newsletter too. It helps me.

Here are some of my older newsletters.

Understanding Kubernetes Node Affinity with an Example.

Container Resource based Autoscaling - Explore How This Works.


Dinesh Balakrishnan

Tech Lead | DevOps | AIOps| Azure | K8s | Terraform | GenAI

6 个月

In my case, we use azure application gateway with static ip address. We create DNS record for the application pointing to the app gateway ip. I never seen any issue with DNS as the gateway ip never changed. But good to know about this in case if we recreate the app gateway we will get different ip and end up with this issue. Thanks for sharing Mutha Nagavamsi ??

Niraj Kumar

Cloud Specialist DevOps at Niveus Solutions Pvt. Ltd.

6 个月

Thanks for sharing such informative stuff and the way you simplified it..!!

Meenakshi A.

Technologist & Believer in Systems for People and People for Systems

6 个月

Thanks for the simple walkthrough of concepts with scenarios for the good ??

Hanshal Mehta

Open Source Developer | Contributor @glasskube @buildsafe @cyclops

6 个月

Yup, I recently came across a podcast with a Engineering Director of the JIO Cinema. There their clients faced the issue of app crash. Later they caught that it was a DNS issue, and they fixed it via routing the DNS.

Mayank Ahuja

Follow for Your Daily Dose of Coding, Software Development & System Design Tips | Tech Book Buff | Exploring AI | Everything I write reflects my personal thoughts and has nothing to do with my employer. ??

6 个月

Interesting, loved the way you simplified it. But I think there could be other reasons as well that can cause such issues for the Customer. The problem could lie with the user's device, other network configuration. Just thinking, it could be that CDN's edge servers having regional outages. ?? But yes, such scenarios are scary.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了