Cutting Costs: Saving $30k+ Per Month with AWS Infrastructure Optimization
After much procrastination, I've finally motivated myself to write a tech blog—something that has been on my to-do list for quite some time.
In this blog, I'll take you through the journey of how we optimised cloud costs at my current company. You'll get an in-depth look at the strategies we implemented, the challenges we faced, and the significant savings we achieved. So, let's dive right in.
Background:
Post-COVID, companies across industries were forced to optimize their operational costs to ensure long-term sustainability. Our approach involved a comprehensive analysis of our existing infrastructure to identify opportunities for cost savings. This included evaluating our cloud usage, optimizing resource allocation and adopting policies and solutions that would help the company in the long-run.
Optimisation Strategies:
To develop effective optimization strategies, we held multiple sessions with our AWS Technical Account Manager. These sessions were instrumental in conducting a comprehensive AWS Well Architected (WA) analysis, which provided a clear understanding of our current infrastructure status. This analysis allowed us to identify gaps in our processes and offered valuable insights for improvement. One of the key pillars in the AWS Well-Architected Tool is Cost Optimization. In this article, we will delve deeply into the strategies and best practices for cost optimization.
Decommissioning Unused VMs:
Note: Do not terminate the instance immediately. General advise is to have it in Stopped state for at least 5 days.
Deletion of Snapshots:
Deletion of objects in S3 bucket and having a retention policy in place:
Deleting the Unattached Volumes:
How to check if the Volume is Unattached ?
Right-sizing the VMs:
领英推荐
Note: While this process can be time-consuming and tiring, the impact on cost efficiency is significant.
Decommissioning the Old Load Balancers:
Migrating Volume from GP2 type to GP3 type:
Savings Plan for RDS:
We observed that the size of the RDS instance for many services didn't change. AWS offers Savings plan options for all our On-Demand instances. We chose to go for Savings plan option for all our RDS instances. This further reduced our costs by 30 percent.
Note: If you intend to change your instance type in the near future, it is better not to opt for this option.
Future Cost-optimisation opportunities
EKS Node Right Sizing:
The services running in our EKS cluster seems to be over-provisioned. There is scope for reducing the workload size in the cluster.
The POD memory requests and POD CPU requests configured for some services are very high. We can verify their current usage and do some right sizing. This can help us reduce a few nodes in the K8s cluster.
Implementing Auto Scaling in K8s:
Auto-scaling the nodes in the EKS cluster is one way to optimise the costs. This is planned as a future improvement.
Horizontal Pod Auto-scaling (HPA) automatically adjusts the number of pod replicas in a deployment based on CPU utilization or other metrics. This ensures that the applications have enough resources during peak times and scaling down during periods of low demand.
The Cluster Autoscaler works at the node level, scaling the number of nodes in your cluster up or down based on the resource requirements of your pods. When the demand increases, the Cluster Autoscaler adds nodes to handle the load; when demand decreases, it removes underutilized nodes.
Implementing these two mechanisms help optimize costs by ensuring that you only pay for the compute resources you actually need.
Conclusion
In conclusion, implementing effective optimization strategies and policies can lead to significant long-term cost savings for the company.
It's advisable to include cloud cost optimization as part of the company's OKRs to continuously monitor and manage resource usage efficiently.
Driving Operational Excellence in SRE, Cloud and On-Premises Infrastructure Management, FinOps, Vendor Management & IT Audits – Expert in Managing Automated Trading Systems & Mission-Critical Apps
7 个月Superb article & you have been continuously chase us ( SysOps & FinOps - Nicholas See) to complete these saving bro. I must appreciate the tremendous support we have received from Long Chen ?? & AWS SG team...
Staff Engineer at Circles.Life
7 个月Very practical tips, good read Avinash Narasimhan
Business Intelligence | AWS | Azure | Kubernetes |DevOps | Docker | Data Engineering
7 个月Interesting article it’s very informative, I wish you could have included the infrastructure part like what was used to provision the additional infrastructure like Cloudformation / Terraform or was it done via management console / SDK and how was the approach better w.r.t your architecture.
Performance Engineer| Co-Founder & CTO at LTE Team | Performance testing visionary | Transforming load testing dynamics
7 个月It sounds like a good summary of a long long journey. Great work. Avinash Narasimhan
Business Intelligence Engineer II at Amazon
7 个月Very informative