登录查看更多内容

35: Why are k8s upgrades so tough?

Luca Galante

Contributor @ Platform Engineering, Product @ Humanitec

发布日期: 2023年6月13日

?? PlatformWeekly here!?You know that feeling you get when you flip your pillow to the cold side in the middle of the night??Reading us is better than that.

Let’s get bakin’

What makes K8s upgrades so challenging?

Text by?Fawad Khaliq, Founder and CTO at Chkk?

Memes?by?Luca Galante

Our last 100+ conversations with DevOps/SREs are summarized in 4 nouns and 3 emotions: “Kubernetes Cluster Version Upgrades”…. “Hard, Pain, Work”.

Why are Kubernetes upgrades so challenging? Why isn’t a Kubernetes upgrade as easy as an iPhone upgrade experience? Here’s what makes it hard and why DevOps/SREs find change management stressful:

1?? Kubernetes isn’t, and shouldn’t be, vertically integrated.

K8s is designed for flexibility and cloud providers work hard to ensure this flexibility isn’t compromised.?

The solution is a cloud-owned k8s control plane (EKS, GKE, AKS, OKE …) with a few managed add-ons (e.g. CoreDNS, CNI …) and some guidance on how to build apps, while giving the flexibility of introducing new components/add-ons/apps to DevOps/SRE teams.?

The cost of this flexibility is that these DevOps/SRE teams must now own the lifecycle of the add-ons and the applications that run on top of the k8s infrastructure.

2?? You don’t know what’ll break before it breaks.

With so many moving pieces, it’s hard to know if your running k8s components have incompatibilities or latent risks.?

Many users use spreadsheets to track what they are running vs what they should be running, which is both painful and error prone.?

We all know that “Not broken != working-as-it-should”. Latent risks and unsupported versions may keep lurking around for weeks/months until they cause impact.?

What’s needed here is sharing the collective knowledge of the DevOps/SRE teams, so if one team has encountered an upgrade risk then everyone else just gets to avoid it without any extra work on their end.

3?? Getting an upgrade right takes a lot of time.

Deloitte’s CIO survey estimates that 80% of DevOps/SRE time is spent in operations/maintenance, and only 20% is spent on innovation.?

领英推荐

Demystifying DevOps Series

Vertisystem 1 年前

5 Best Open-Source Tools to Monitor Containers

Arun KL 2 年前

Navigating the Shift from DevOps to Platform…

Sri Thuraisamy 1 年前

I am not surprised as cooking up a “safe” upgrade plan is a huge time sink. You have to read an inordinate amount of text and code (on release notes, GitHub?issues/PRs, blogs, etc.) to really understand what’s relevant to you vs what’s not.?

This can take weeks of effort, which is time that you could’ve spent on business critical functions like architectural projects and infrastructure scaling/optimization

Fawad is the Founder and CTO at Chkk - a company focused on eliminating operational risks through Collective Learning. Formerly, he was a technical lead for Amazon EKS, and early engineer at PLUMgrid, creators of eBPF. You can follow him on Twitter @fawadkhaliq.

Read the full article here.

Is the era of microservices?over?

Lambda and serverless were touted by AWS to be the future, but even their own engineers disagree.

Last week, an Amazon Prime Video case study stirred up some controversy when the team revealed they had reduced costs by 90% by moving from microservices back to a monolith: “Microservices and serverless components are tools that do work at high scale, but whether to use them over monolith has to be made on a case-by-case basis.”

It’s surprising to some because AWS frequently frames microservices and serverless architecture as the best way to modernize applications.?

But it also isn’t surprising (or, at least, it shouldn’t be ) that some architectures work well for some businesses but not for others.?

Amazon Prime Video’s old architecture was based on Amazon Lambda, which is good if you want to build services quickly. However, it wasn’t cost-effective when running at high scale. Let’s take the orchestration workflow, for example. Alex Xu succinctly explained that “AWS step functions charge users by state transitions and the orchestration performs multiple state transitions every second.”

Furthermore, in the old architecture, intermediate data was stored in Amazon S3 before it was downloaded. High volume downloads became très cher .

A monolithic architecture is supposed to address these cost issues. From Alex Xu, again: “There are still 3 components, but the media converter and defect detector are deployed in the same process, saving the cost of passing data over the network.”?

And that’s where the 90% cost reduction came from! Pretty neat, right?

So the next time someone tells you “microservices good, monoliths bad” (or “monoliths good, microservices bad” for that matter), kindly send them this newsletter??. And remember: your business should determine your architecture, not the other way around.

Have you joined the Platform Engineering Slack channel? If not, you're missing out.?Join us to weigh in on some open questions:

Platform Weekly

3,788 位关注者

要查看或添加评论，请登录

Luca Galante的更多文章

#116: Make regulated companies fun again

2025年2月10日

#116: Make regulated companies fun again

Hey there! Welcome to Platform Weekly. Your weekly lay of platform engineering bricks.
#115: The next DevOps is dead?

2025年2月3日

#115: The next DevOps is dead?

Hey there! Welcome to Platform Weekly. Your weekly peel of the platform engineering banana.
#114: understanding isn't enough

2025年1月23日

#114: understanding isn't enough

Hey there! Welcome to Platform Weekly. Your weekly crack of the platform engineering walnut.
#113: Top 5 memes in platform engineering

2025年1月15日

#113: Top 5 memes in platform engineering

Hey there! Welcome to Platform Weekly. It’s our first newsletter of the year, and while we’re still in those holiday…
#112: predictions for 2025

2025年1月10日

#112: predictions for 2025

Hey there! Welcome to Platform Weekly. It’s our final sip of platform engineering juice for the year.

1 条评论
#111: 2024 in review

2025年1月8日

#111: 2024 in review

Hey there! Welcome to Platform Weekly. Your weekly sip of platform engineering mulled wine.
#110: why 18% are failing

2024年12月19日

#110: why 18% are failing

Hey there! Welcome to Platform Weekly. Your kick of platform engineering sand.

1 条评论
#109: Platform engineering tools

2024年12月11日

#109: Platform engineering tools

Hey there! Welcome to Platform Weekly. Your weekly climb of the platform engineering tree.
#108: We left Kubernetes

2024年12月5日

#108: We left Kubernetes

Hey there! Welcome to Platform Weekly. Your weekly bowl of platform engineering popcorn.

4 条评论
#107: Platform Engineering in 2025

2024年11月29日

#107: Platform Engineering in 2025

Hey there! Welcome to Platform Weekly. Your weekly gathering of the fellowship of the platform.

See all articles

35: Why are k8s upgrades so tough?

Luca Galante

Contributor @ Platform Engineering, Product @ Humanitec

What makes K8s upgrades so challenging?

领英推荐

Is the era of microservices?over?

Platform Weekly

3,788 位关注者

Luca Galante的更多文章

社区洞察

其他会员也浏览了

DevSecOps a ultimate decision maker for time to market

Next Gen DevOps #16 – Your Monthly Guide to Better Software Delivery

DevSecOps- building a security foundation into DevOps

A Day in the Life of a Cloud-DevOps Architect: Balancing Code and Clouds

The Future of DevOps Trends and Predictions for 2025

DevOps Challenges: Part One Environment provisioning

Another Metrics Food Fight - Security and DevOps

DevOps in Cloud-Native Applications – Challenges and Opportunities

Compuware: Our Plan. Our Passion. Our Promise.

Why DevOps Is More Than Just CI/CD: My Journey So Far

What makes K8s upgrades so challenging?

领英推荐

Is the era of microservices?over?

Platform Weekly

3,788 位关注者

Luca Galante的更多文章

#116: Make regulated companies fun again

#115: The next DevOps is dead?

#114: understanding isn't enough

#113: Top 5 memes in platform engineering

#112: predictions for 2025

#111: 2024 in review

#110: why 18% are failing

#109: Platform engineering tools

#108: We left Kubernetes

#107: Platform Engineering in 2025

社区洞察

其他会员也浏览了

DevSecOps a ultimate decision maker for time to market

Next Gen DevOps #16 – Your Monthly Guide to Better Software Delivery

DevSecOps- building a security foundation into DevOps

A Day in the Life of a Cloud-DevOps Architect: Balancing Code and Clouds

The Future of DevOps Trends and Predictions for 2025

DevOps Challenges: Part One Environment provisioning

Another Metrics Food Fight - Security and DevOps

DevOps in Cloud-Native Applications – Challenges and Opportunities

Compuware: Our Plan. Our Passion. Our Promise.

Why DevOps Is More Than Just CI/CD: My Journey So Far