登录查看更多内容

A Fundamental Mistake in "DevOps"

Chris S.

发布日期: 2023年2月23日

I've been working as a "DevOps Engineer" for about 8 years, having been an infrastructure guy for about 15 years before that. I've been a part of many Agile software teams, for what you might consider small companies and huge companies. I do CI/CD pipelines, Infrastructure as Code (IaC), and automation scripts for a living. That's what DevOps Engineers do (even if that's the not the spirit of the term "DevOps" as it was originally envisioned.)?

The default tool that companies use for IaC is Hashicorp Terraform. It supplies a high-level descriptive language that everyone can learn and use from company to company. It is extensible with plug-ins for anything you want to create a plug-in to use it with. Some of the most common plug-ins are those for the cloud providers: Azure, AWS, GCP, etc. This makes Terraform appear less cloud-specific, because you can use the same language and just change your plug-ins.?

One of the major features of Terraform is that after you run an "apply" to do some work, you get a text file having the details of what was done. This is called the "state file", and you can keep that in source control or whatever. The next time you run an "apply", Terraform looks at the earlier state file, figures out what is changing, and changes exactly that. Terraform treats the state file as an authoritative record of what has been done. This means that if users make manual changes within whatever system you're automating, they can be reset back to what you have coded every time. ?

Now for an analogous scenario. When you take your car in for maintenance, the technicians in the shop check your tire pressures, battery, do an oil change, check your filters, and God knows what else. They diagnose what they need to do, they fix all the things, and you leave with a newly maintained car. They also hand you a form that shows what they did, the tire pressures when they finished, and so on. This is great!?

However, by the time you get home, at least one of those values is likely different. Maybe you got a flat tire. Maybe the guy didn't tighten something correctly and there's a new leak. Heck, you could have a complete engine failure before you leave the lot. ?

Your car, nor the technicians in the shop, give a damn what is written on that document, once you leave the shop. Furthermore, when the car comes back, they don't even look at the earlier file. They know what they are doing, know how to find what the proper values are from their own experience and the actual manuals. Your earlier record is useless to them.?

领英推荐

ArgoCD: A DevOps Engineer's Best Friend - Real-World…

Vikash K. 5 个月前

Jenkins or ArgoCD? -> Choosing the Optimal CI/CD…

Dmytro Konstantynov 3 个月前

Everything As Code (EaC) What It Is and Why It's…

TL Consulting Group 2 年前

For IT organizations, this means believing that the state file is sacrosanct ignores all reality, and is a fundamental mistake made by DevOps engineers around the world.?

In all but the smallest IT organizations, there are many teams with their fingers in the infrastructure. Security teams must push policies. Support teams must be able to fix stuff as it breaks. And so on. In order for those teams to get work done, they cannot go to every infrastructure team and beg them to make changes to their Terraform code.?

No team works within a bubble. Not the development teams, not the security teams, not the support teams, not any other teams. You cannot expect that your hands are the only ones touching "your stuff". Believing so is delusional. ?

Just like the auto techs, when your systems are out-of-whack, you should not default to setting them back to the way they were the last time you touched them. Your team should have the skills and knowledge to diagnose problems and set them back to what is proper RIGHT NOW, without the assumption that everything was correct before. Setting them back is akin to rebooting your computer every time there's a problem: it might fix the problem temporarily, but the root cause is not corrected.?

What does this mean for Terraform? Well, for me, it means that when you start using Terraform, you must accept that there are going to be state file problems like these down the road. These problems are unavoidable, and they happen on every team I've ever been a part of. If you don't like it, pick another tool. If you get to a point where you are spending more time fixing Terraform-related issues than you are doing work that brings value to your users, start looking into a different solution. Pulumi, "just bash or PowerShell scripts", etc., are all possibilities. And don't automatically rule out "manual with a UI". It has worked for decades. Be open minded and make your life better.?

Jayaprakash Nimmala

Cloud Infrastructure Architect - Infrastructure Automation and Cloud Engineering

2 年

Cannot agree more. Terraform with a state file has its issues. People spend so much time trying make the state file work.

1 次回应

要查看或添加评论，请登录

Chris S.的更多文章

A Question for Data People

2023年4月4日

A Question for Data People

A little background: I'm an old math geek. I took darn near every undergraduate math class offered at both Morehead…
PowerShell Modules Rule!

2023年3月24日

PowerShell Modules Rule!

Say you have CI/CD pipelines. You have Azure DevOps (ADO) and are finally using YML pipelines.
Low Code "Revolution"

2023年3月14日

Low Code "Revolution"

I saw an advertisement for Brainboard (Brainboard | Design, Deploy and Manage Multi-Cloud) this morning. I looked into…

1 条评论
A Terrible Terraform Pattern

2023年3月3日

A Terrible Terraform Pattern

Here's a scenario I've seen in multiple enterprises using Azure. Company decides to go with Terraform for all their…
Right Level of Automation

2023年1月25日

Right Level of Automation

I believe in automation and CI/CD..
Skepticism of Competence

2021年8月4日

Skepticism of Competence

My wife said something to me yesterday that I've been really thinking about now for the last 24 hours. She's worked in…

5 条评论
Service Level Agreement Part 3

2021年3月12日

Service Level Agreement Part 3

Part 1 and Part 2 of this series covered the basics of probability and service level agreements. Now it is time to get…
Service Level Agreements Part 2

2021年3月10日

Service Level Agreements Part 2

Part 1 Hopefully, folks are feeling "refreshed" after viewing Part 1 of this series. So now let's talk about Service…
Probability and SLAs, Part 1

2021年3月8日

Probability and SLAs, Part 1

I recorded this quickly today as a refresher on probability. There are some links in the slides that I go through that…
A Series on How to Calculate Service Level Agreements

2021年3月4日

A Series on How to Calculate Service Level Agreements

When you sign up for a specific service, you are promised a percentage of time that the service will be available; this…

See all articles

A Fundamental Mistake in "DevOps"

Chris S.

领英推荐

Chris S.的更多文章

社区洞察

其他会员也浏览了

Everything As Code (EaC) What It Is and Why It's Gaining Popularity?

How to get into DevOps

Kubernetes APIs and Terms You Should Know as a DevOps or SRE

TechLaughs: CI/CD Wishes & DevOps Dreams

?? CI/CD Success: Leveraging Jenkins Declarative Pipelines for Better Automation

DevOps Quick Bites, PMD Updates, Flow Structure and Workflow/Process Builder Migration

#14: Do Platform Engineers require the same skills as DevOps Engineers? ??

OpenShift GitOps: A path to Continuous Delivery and Infrastructure automation

The nutshell on enterprise DevOps

The "Ops"ening Pandora's Box: When Buzzwords Become Blueprints (An Engineer's Guide)

领英推荐

Chris S.的更多文章

A Question for Data People

PowerShell Modules Rule!

Low Code "Revolution"

A Terrible Terraform Pattern

Right Level of Automation

Skepticism of Competence

Service Level Agreement Part 3

Service Level Agreements Part 2

Probability and SLAs, Part 1

A Series on How to Calculate Service Level Agreements

社区洞察

其他会员也浏览了

Everything As Code (EaC) What It Is and Why It's Gaining Popularity?

How to get into DevOps

Kubernetes APIs and Terms You Should Know as a DevOps or SRE

TechLaughs: CI/CD Wishes & DevOps Dreams

?? CI/CD Success: Leveraging Jenkins Declarative Pipelines for Better Automation

DevOps Quick Bites, PMD Updates, Flow Structure and Workflow/Process Builder Migration

#14: Do Platform Engineers require the same skills as DevOps Engineers? ??

OpenShift GitOps: A path to Continuous Delivery and Infrastructure automation

The nutshell on enterprise DevOps

The "Ops"ening Pandora's Box: When Buzzwords Become Blueprints (An Engineer's Guide)