AWS / TF public warning
Today I have found out 2 of the most fun things that conspired against me.
The first is the Terraform aws_iam_policy_attachment.
I normally work on large AWS projects and rely on infrastructure as code to manage them. I use a range of IaC products depending on the client. Working on large projects it is unfeasible to have a single IaC stack to manage the whole environment so it is important to manage different logical elements using different stacks.
In walks my Friday nemesis aws_iam_policy_attachment
One of my colleagues had used this method to attach an AWS managed policy used for an EKS cluster. Looking at the Terraform docs there is this big warning:
This resource in Terraform is a very dangerous tool if you have multiple stacks. You can easily remove a policy from another part of your system. Everything is happy where you are working but something else is on fire
领英推荐
The second fun thing that I have never played with before is the EKS cni containers. There is a reason I have never played with them before and that is they have always "just worked". Unfortunately if you remove the cni role policy in AWS then the cni containers die. All other pods will fail to start.
Running kubectl get pods -n kube-system give you a list of the aws-nodes that should house the cni containers. If there is an issue, the pod will report 0/1 ready. The container dies with a 137 exit code (this is often used by the out of memory manager). When inspecting the container it reports that it was not killed by OOM. The last line in the log is "Checking for IPAM connectivity ..." which is the 3rd line of the normal start up process.
If anyone can point me to any useful debug of this then let me know!
So you have been warned:
Hopefully I can help save someone else from a "Fun Friday"