AWS – Cost Optimisation 101
Adrian Cantrill
AWS/Cloud Course Creator @ learn.cantrill.io // follow for updates! connect to chat // youtube.com/c/LearnCantrill
(originally posted at https://cantrill.io/2015/08/14/aws-cost-optimisation-101/ )
I’ve spent a good chunk of my time recently performing cost optimisation exercises for clients and a recurring element is a perception that its difficult. The process is fairly simple, enough so that anyone can learn how to implement savings at a rudimentary level. Amazon are transparent with the tools, processes and architectures needed to reduce costs; there really isn’t an excuse to be inefficient with your cloud spend.
While the process of cost optimisation is simple, you will achieve better results when you outsource the task to an expert who can work cooperatively with you/your team.
My process for optimising spend in AWS consists of four main streams, namely :-
- Right Size instances, processes and services – make sure resources are used appropriately.
- Minimise waste – ensure nothing is wasted within the environment.
- Optimise procurement of resources – make appropriate use of AWS cost reduction techniques such as reserved instances.
- Architect to leverage AWS – an advanced topic, but this is where a substantial proportion of the benefits can be achieved (more below)
Cost Optimisation isn’t one process – it’s a collection of techniques, which, when used together can achieve significant efficiency gains.
Instance and service ‘right-sizing’
Cloud isn’t a panacea – you should ensure that resources provisioned for a particular workload are ‘goldilocks-sized’ – not too big, not too small, but just right.
Generally, the biggest waste I encounter within AWS environments is the incorrect usage of resources from a sizing perspective. Many people, especially when migrating early workloads to the cloud lack confidence. People tend to over-provision resources, picking large AWS instance sizes and fail to revisit and reassess usage-over-time.
AWS make this easy, a free built in tool called cloud watch provides a complete monitoring and performance trending tool free of charge ( some advanced features do carry additional costs).
At a high level, selecting EC2 from the console, navigating to your instance list and clicking on the monitoring tab will show you basic information for an instance. A common issue (as in this example) is that the CPU usage is abnormally low, suggesting a smaller instance could be utilised to reduce ongoing costs without impacting performance.
Selecting ‘cloudwatch’ from the AWS console allows further performance metrics to be visualised including disk read and writes. Additionally you can allow access to internal metrics such as ‘memory utilisation’ by adding custom monitoring scripts to instances, detailed here.
Once you’ve reviewed utilisation, you can check the AWS instance sizes page and identify a more suitable instance. Changing size is easy, power down the instance, right click and select ‘Change Instance Type’
Minimise waste (especially Snapshot storage)
Wasting resources within AWS is easy. A benefit of the platform is that most services are billed on an hourly basis; costs add up in the background without immediate ‘bill shock’. On the whole it’s easy to spot waste within most AWS services (e.g above, with instance right-sizing). But certain services make this less visible.
Take ‘Storage’ within AWS, specifically instance storage – EBS Volumes and EBS Snapshots. EBS volumes are used by the instance as primary storage (at a simple level the instances hard drive storage), and secondly snapshots (i.e backups) of volumes. The former are generally fairly static, only increasing in cost when you increase the sizes of volumes attached to instances. The latter increases in line with the rate of data change in your snapshots.
Clicking on EC2, and selecting SNAPSHOTS from your ‘Elastic Block Store’ folder will list all the snapshots within the current region of your account.
These can grow if not controlled by automated pruning processes or manual intervention. It should be noted that as a AWS customer you are charged based on the amount of space ‘used’ within the snapshots (i.e rate of data change) rather than the amount reported in the ‘Size’ Column above.
Certain AWS products allow automated snapshots (as above). And some by their very nature can generate large amounts of data change i.e AWS Storage Gateway.
A clear strategy for snapshot storage and pruning should be agreed with the business, taking into account required RPO and RTO levels. A traditional Grandfather, Father, Son (GFS) retention scheme can be created by utilising AWS service tagging – happy to elaborate if required.
Optimise resource procurement
AWS offers a number of different instance procurement methods or types. There are two main categories ‘on demand’ and ‘reserved’. A third type does exist, namely ‘spot instances’ but we’ll discuss that more in the following section.
On-demand as the name suggests are instances which are launched ‘on demand’. From an AWS infrastructure perspective they are the least optimal for resource planning; AWS engineers can’t efficiently forecast requirements on an ongoing basis without planning or commitment on your part. As such, on-demand instances come at a cost premium.
On the plus side, they can be started on demand, shut down when not used, and crucially you will only pay for what you use. This makes them ideal for general level sporadic usage where you are unable to provide any commitment or ongoing usage.
Reserved instances provide additional value via lower running costs. AWS achieves these discounts because by using reserved instances you are committing to a certain level of usage and this allows the AWS capacity teams to plan their own physical infrastructure deployment based on these purchases.
Reserved instances come with two main configurable elements, term and payment option. The term is 1 or 3 years, with the latter providing the best reductions to cost. The payment option types are no upfront, partial upfront and All Upfront. The discount level is maximised at ‘All upfront’ and to achieve the overall greatest cost savings, a 3-year, All upfront reserved instance should be utilised. for ~45% savings.
There are a few important things to keep in mind when making a reserved instance purchase.
You are making a contractual commitment. Purchasing a 3-year reserved term purchase means you are committing to pay costs (even though reduced) for an instance of a certain type to be running for 3 years. Be sure you have a good IT roadmap before going down this path. What if your needs change i.e you decommission or restructure an application ?
Instance reservations are on a ‘per availability zone’ level – if your region has multiple zones, be sure you have planned where each instance will reside long term before committing to purchase
Don’t use ‘All upfront’ if there is a chance you will power instances off – the nature of this payment option is that you are paying for 100% of the use upfront. You cannot achieve savings by switching off a ‘All upfront’ instance type.
Appropriate cloud architecture
Architecture within AWS is one element of cost optimisation which is often neglected, generally because its ‘hard’ for most partners to advise on. Traditional integrators generally have no application and/or platform architecture experience so its very much outside their comfort zone.
There is also a belief in the industry that traditional ‘legacy’ applications can’t be re-architected to offer some of the same benefits as true-cloud applications and platforms – this is false.
At a simple level, appropriate architecture means that your infrastructure/platform/application needs to allow positive and negative scaling in line with demand. From a non-technical point of view this means grow in peak demand periods and shrink during off peak hours. Most traditional infrastructure isn’t designed this way. Consider your corporate infrastructure, most organisations have one (or more) monolithic file server or database servers designed to be large enough to cope with peak load. The capacity is designed to meet peak requirements and is largely wasted during periods of lower or no demand i.e outside business hours.
A few questions I usually ask clients :-
- Which instances can be turned off outside working hours ?
- Which services can be scaled down outside working hours ?
Once armed with the above information you can start to provide architectural guidance – it may start with moving your file/DB services onto multiple smaller servers some of which can be powered off during off peak hours.
Another service AWS provide is ‘spot instances’. Simply put, spot instances occupy spare AWS capacity at a given time of day. You ‘bid’ a max price on instances and while the ‘spot price’ is lower than that value you gain access to instances. The price of these instances is substantially lower than on-demand, but access to them isn’t guaranteed. If the spot price increases and your bid price doesn’t track along with it you will lose access to the spot pool. Effective use of spot instances requires applications which are designed to tolerate sudden instance termination without service impact (the topic of a future article).
By architecting your application to make use of spot instances you can achieve significant cost savings, or provide an increase in capacity without additional expense. The scope of using spot instances goes much farther than what I can cover in this article. If you would like to discuss please get in touch and we can have a no obligation chat.
What Sort of benefits can be expected ?
The benefits you will achieve will depend on how poorly your environment is configured and how in-depth your cost optimisation exercise is. For projects I’ve completed recently I’ve been able to achieve savings of ~ 50% with no loss of functionality but your milage may vary.
I hope this has been of use, as always, appreciate any feedback and if you do need a consultation on costs within your AWS environment please get in touch and I will try and assist.
Expert Oracle DBA, Cloud Database Migration Specialist, RAC, GoldenGate, RMAN, DataGaurd, AWS, OCI certified, Database Migration specialist.
5 年Very informative and easy to understand. Thanks for sharing.
Principal Platform Engineer
6 年Very informative article Adrian Cantrill, thanks much!
Managing Partner at Humanised Group
9 年Very well written Adrian. An easy to understand introduction to a few of the basic rules to consider when trying to manage your costs in AWS.
AWS/Cloud Course Creator @ learn.cantrill.io // follow for updates! connect to chat // youtube.com/c/LearnCantrill
9 年Thanks Nev :)
Great write up Adrian. Especially about after hours utilisation and their inherit savings via Cloud use. Follow the Sun infrastructure. Awesome.