A Fundamental Mistake in "DevOps"
I've been working as a "DevOps Engineer" for about 8 years, having been an infrastructure guy for about 15 years before that. I've been a part of many Agile software teams, for what you might consider small companies and huge companies. I do CI/CD pipelines, Infrastructure as Code (IaC), and automation scripts for a living. That's what DevOps Engineers do (even if that's the not the spirit of the term "DevOps" as it was originally envisioned.)?
The default tool that companies use for IaC is Hashicorp Terraform. It supplies a high-level descriptive language that everyone can learn and use from company to company. It is extensible with plug-ins for anything you want to create a plug-in to use it with. Some of the most common plug-ins are those for the cloud providers: Azure, AWS, GCP, etc. This makes Terraform appear less cloud-specific, because you can use the same language and just change your plug-ins.?
One of the major features of Terraform is that after you run an "apply" to do some work, you get a text file having the details of what was done. This is called the "state file", and you can keep that in source control or whatever. The next time you run an "apply", Terraform looks at the earlier state file, figures out what is changing, and changes exactly that. Terraform treats the state file as an authoritative record of what has been done. This means that if users make manual changes within whatever system you're automating, they can be reset back to what you have coded every time. ?
Now for an analogous scenario. When you take your car in for maintenance, the technicians in the shop check your tire pressures, battery, do an oil change, check your filters, and God knows what else. They diagnose what they need to do, they fix all the things, and you leave with a newly maintained car. They also hand you a form that shows what they did, the tire pressures when they finished, and so on. This is great!?
However, by the time you get home, at least one of those values is likely different. Maybe you got a flat tire. Maybe the guy didn't tighten something correctly and there's a new leak. Heck, you could have a complete engine failure before you leave the lot. ?
Your car, nor the technicians in the shop, give a damn what is written on that document, once you leave the shop. Furthermore, when the car comes back, they don't even look at the earlier file. They know what they are doing, know how to find what the proper values are from their own experience and the actual manuals. Your earlier record is useless to them.?
领英推荐
?
For IT organizations, this means believing that the state file is sacrosanct ignores all reality, and is a fundamental mistake made by DevOps engineers around the world.?
In all but the smallest IT organizations, there are many teams with their fingers in the infrastructure. Security teams must push policies. Support teams must be able to fix stuff as it breaks. And so on. In order for those teams to get work done, they cannot go to every infrastructure team and beg them to make changes to their Terraform code.?
No team works within a bubble. Not the development teams, not the security teams, not the support teams, not any other teams. You cannot expect that your hands are the only ones touching "your stuff". Believing so is delusional. ?
Just like the auto techs, when your systems are out-of-whack, you should not default to setting them back to the way they were the last time you touched them. Your team should have the skills and knowledge to diagnose problems and set them back to what is proper RIGHT NOW, without the assumption that everything was correct before. Setting them back is akin to rebooting your computer every time there's a problem: it might fix the problem temporarily, but the root cause is not corrected.?
What does this mean for Terraform? Well, for me, it means that when you start using Terraform, you must accept that there are going to be state file problems like these down the road. These problems are unavoidable, and they happen on every team I've ever been a part of. If you don't like it, pick another tool. If you get to a point where you are spending more time fixing Terraform-related issues than you are doing work that brings value to your users, start looking into a different solution. Pulumi, "just bash or PowerShell scripts", etc., are all possibilities. And don't automatically rule out "manual with a UI". It has worked for decades. Be open minded and make your life better.?
Cloud Infrastructure Architect - Infrastructure Automation and Cloud Engineering
2 年Cannot agree more. Terraform with a state file has its issues. People spend so much time trying make the state file work.