The cloud isn't infinite, and a step-by-step guide to kill unused resources
?? iLyas Bakouch
Hands on problem solver. Cloud Architect, and Product Builder with a proven track record in leading world-class software development teams and introducing innovative products to market
Cloud is not "pay for what you use" it's "pay for what you forgot to turn off"
But like most things in life, it's funny, until it's about you, and I'm frugal by nature, never pleased to see something goes to waste, be it physical or digital.
And thus, I declared war on useless and Tagless resources. The hunt for a solution led me down some interesting paths. This was not a recent struggle, nor was I the only one affected by it. In some instances, people were struggling with cleaning nameless EC2 instances appearing out of nowhere, but luckily, that was not my issue. Mine was straightforward: cost optimization by deleting resources with no Tags. Why Tagless resources you ask? Because most of the time, these resources are the result of unplanned and hasty workloads launched directly from the console, so harder to clean after the fact and are often left for dead, or undead, in a zombie like state.
From proprietary to open source, the solutions were many and I spent quiet some time battling with a few of them (mostly the free and open source ones) but I had a few imperative requirements:
- I wanted my solution to be Serverless
- I wanted metrics and reporting
- I have workloads running on a few cloud providers so I wanted a cloud hybrid solution
- And last, I wanted a solution that covers as many resources as possible, across cloud providers
And so after a few weeks of bloodshed, I settled on Cloud Custodian
Cloud Custodian is a rules engine for managing public cloud accounts and resources. It allows users to define policies to enable a well managed cloud infrastructure, that's both secure and cost optimized. It consolidates many of the adhoc scripts organizations have into a lightweight and flexible tool, with unified metrics and reporting.
It seemed to have everything I needed, the README.md was a great place to start, but I was yet to make it completely Serverless. But nothing can stand in the face of strong will (and a few double shot of Espresso, mostly the Espresso though) And so in this blog post I will provide a step by step guide on how to setup a "tag-compliance policy" using cloud-custodian, on AWS Lambda, making it completely Serverless. This policy will check for a specific tag on running EC2 instances and if missing, it will perform the defined action, in our case, KILL the instance.
Prerequisites
Although Cloud Custodian is cloud-hybrid, my post will only cover AWS. So you'll need an IAM access that can create policies, roles and users.
1- IAM Role:
The Lambda function will require access to perform actions on EC2 instances. This can be achieved by attaching an IAM role to it.
So first, create an IAM role "custodian-tag-role" and grant AmazonEC2FullAccess policy to it.
2- IAM User:
Second, to create the Lambda function from your local machine, you need programmatic access to AWS. For this, create an IAM user "custodian-user" and grant the following permissions/policies:
- AWSLambdaFullAccess (AWS managed policy):
- CloudWatchFullAccess (AWS managed policy)
- IAMPassRole
{ "Version": "2012–10–17", "Statement": [ { "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::XXXXXXXX:role/lambda_basic_execution" }] }
Load the provided key and secret key in your environment:
export AWS_ACCESS_KEY_ID=<ACCESS_KEY> export AWS_SECRET_ACCESS_KEY=<SECRET_KEY> export AWS_DEFAULT_REGION=<REGION>
Also, copy the new user's ARN and add it to the trust relationship of the IAM role created before "custodian-tag-role"
3- Install Cloud Custodian:
Use pip for the installation:
pip install c7n
And once the installation complete, create a policy document. Cloud custodian policy is a YAML document that defines the policies and actions to be taken on cloud resources. You can find example policies on this link. Here is the one we will be using for our example:
policies: - name: owner-tag-compliance mode: type: periodic schedule: rate(1 hour) role: arn:aws:iam::XXXXXX:role/custodian-tag-role resource: ec2 description: | Schedule a resource that does not meet tag compliance policies to be stopped in four days. filters: - State.Name: running - "tag:Owner": absent actions: - stop
This policy will create a Lambda function with role "custodian-tag-role" and a CloudWatch rule to trigger it every 1 hour and check EC2 resources for the following filters:
- The instance is in running state
- The “Owner” tag is absent
Instances matching this criteria will be stopped. Save the file with the name policy.yml and run the following command to execute it.
custodian run -s . policy.yml
Go back to the AWS console and you shall find a new Lambda function that will run every 1 hour, check the filters and STOP any EC2 instance that doesn't comply.
And that's all there is to it. If you want to see various commands with custodian:
custodian -h