AWS Compute Optimizer and the art & science of optimization
During and before re:Invent 2019, Amazon unveiled over 77 new product announcements and unique capabilities. One of the offerings, quietly announced through a blog on December 3rd, the first day of re:Invent, was AWS Compute Optimizer.
AWS Compute Optimizer is a new addition to AWS' expanding set of native tools that are focused on helping customers to regain control of their cloud bills. The other native tools are AWS Trusted Advisor, and Amazon EC2 Resources Optimization Recommendations service (under Cost Explorer), which was introduced in July 2019.
In this post, we will take an in-depth look at AWS Compute Optimizer, its capabilities, and how it compares to the existing offerings from AWS and answer the question, is it enough?
AWS Compute Optimizer Overview
Compute Optimizer is a Machine Learning (ML) based tool that analyzes CloudWatch metrics of EC2 instances and Auto Scaling groups and generates recommendations to help users with choosing the optimal instance types for their workloads.
Analyzed Metrics
By Default, Compute Optimizer will analyze CPU, Storage IO, and network IO utilization (ingress and egress from all NICs), collected from CloudWatch. Users can enable OS-level memory metrics by installing and configuring the CloudWatch Agent.
If memory is not collected, AWS promises that the tool will try not to reduce the memory capacity assigned to EC2 instances. This is an improvement from the Cost explorer rightsizing recommendations.
Amazon recommends enabling detailed monitoring on CloudWatch, which will increase the data collection increments from 5-minutes to 1-minute (note that detailed monitoring on CloudWatch will result in extra charges).
Observation Period
The recommendations will be based on the last 14 days of data; the sampling period is not configurable. Instances must be running for at least 30 hours (in some cases up to 60 hours) before recommendations will be generated.
Recommendations
The first recommendations will show up for eligible EC2 workloads within 12 hours after opting in and enabling the service on your account, the recommendations will refresh and generate daily. The recommendations will leverage five instance families: M (General Purpose), C (Compute Optimized), R (Memory Optimized), T (General purpose with burstable capabilities), and X (Memory Optimized, ideal for in-memory DBs). Any instance assigned with unsupported instance types, such as Accelerated Computing (P, G, and F) or Storage optimized (D, I, and H), will not see recommendations.
The service will analyze usage of EC2 instances and will categorize instances as one of the following (using AWS' language) and will generate sizing recommendations as needed:
- Under-provisioned: An EC2 instance is considered under-provisioned when at least one specification of your instance such as CPU, memory or network, does not meet the performance requirements of your workload.
- Over-provisioned: An EC2 instance is considered over-provisioned when at least one specification of your instance such as CPU, memory or network, can be sized down while still meeting the performance requirements of your workload, and when no specification is under-provisioned.
- Optimized: An EC2 instance is considered optimized when all specifications of your instance such as CPU, memory and network, meet the performance requirements of your workload, and the instance is not over-provisioned. An optimized EC2 instance runs your workloads with optimal performance and infrastructure cost. For optimized resources, Compute Optimizer might sometimes recommend a new generation instance type.
When the tools generate a recommendation for EC2 instances, it will present the users with three options to choose from.
Auto Scaling Groups recommendations will focus on optimized and not-optimized auto-scaling groups.
The execution of the recommendation is manual. The tool will provide a direct link to the respected instance page, but users still need to manually stop, select the instance type, and start the instance again.
Analytics Engine
AWS didn't share a lot of details on the rightsizing analytics engine powering its new tool. What we do know is that it is based on Machine Learning. It analyzes the EC2 workloads' utilization across the supported metrics and uses insights from millions of workloads running on AWS to make its recommendations on which instance size to use.
Amazon did not provide details on the weight of observed peaks and average utilization in determining what precisely optimized, under-utilized, or over-utilized workload is, but it is safe to assume these are not configurable.
Lastly, it is not clear if the tool considers existing Reserved Instances or Savings Plans when it generates its recommendations or even considers the pending RI or Savings plans purchasing recommendations offered in Cost Explorer. This is important since sizing workloads without assessing the current RI inventory may result in higher on-demand charges, especially when doing cross-family sizing.
AWS Compute Optimizer vs. existing offerings from AWS
Compute Optimizer provides additional functionality vs. the other tools from AWS, namely AWS Trusted Advisor and Cost Explorer EC2 rightsizing recommendations.
The common elements among all tools are the non-configurable observation period of the last 14 days and the lack of any user-customization with regards to rightsizing analytics.
AWS Trusted Advisor would identify and alert when instances CPU utilization has exceeded a threshold to determine efficiency opportunities or performance risks. For example, if an instance CPU utilization was 10% or less and network IO was 5MB or less for four or more days, it will be marked as under-utilized. If CPU utilization was more than 90% for four or more days, it would be marked as over-utilized. These thresholds cannot be modified, and users will have to determine what new instance type to use.
The relatively new EC2 rightsizing recommendation service under Cost Explorer is also a threshold-based tool. Any instance with CPU peak below 1% over the last 14 days will be marked as Idle with a recommendation to terminate it (assuming it is not needed). Instances with CPU peak between 1% and 40% will have a size down recommendation but within the same instance family (for example from m4.xlarge to m4.large).
With Compute Optimizer, AWS is now offering cross-family rightsizing recommendations (within the five instance types it supports). Compute Optimizer also expands the recommendations to Auto Scaling groups beyond EC2.
Is Compute Optimizer enough?
No. Compute Optimizer is designed to help AWS users with selecting the right instance types for their workloads. However, when it comes to the "last-mile" of choosing the optimal instances and executing the rightsizing, the onus is on the user.
Furthermore, rightsizing is only one of several methods of reducing cost on the cloud, and it should never be done in isolation without considering the other available options. The other available options include using Reserved Instances (or the new AWS Savings Plans) or stopping instances when not needed, for example.
The cloud is very complex and applications are getting more complex, while customers are introducing more and more applications. This presents a challenge far beyond human scale. Leveraging CloudWatch is a great start, and collecting OS-level metrics via the CloudWatch agent or 3rd-party tool is better. However, the most effective approach is to let the application drive resource allocation, and this is where Turbonomic Application Resource Management comes in.
Turbonomic Application Resource Management offers the most powerful and advanced optimization functionality of any solution in the market. Here are a few examples to support this bold claim:
- Top-down, application-aware approach - Turbonomic is designed first and foremost to assure application performance. Turbonomic offers multiple agentless options to connect, ingest and act upon application-level metrics such as Heap utilization, Database memory and application response times. The best way to reduce application costs is to focus on application performance first. When you are allocating the exact resources your application needs when it needs them, you will naturally achieve cost reduction without introducing performance risk. Simple as that!
- You need trustworthy actions, not recommendations - as stated, when using the native cloud offerings, users still need to decide on which exact instance type to use, confirm it meets any constraints (such as drivers for the NICs (ENA) or storage (NVMe), and considering the impact on storage and network ). Then users have to execute them during the next maintenance window manually; This is not sustainable or scalable! Turbonomic generates trustworthy actions that can (and should) be scrutinized by users before executing them from the Turbonomic UI and once comfortable, fully automating, or even integrating with tools like ServiceNow for approval workflow and audit trail.
- Different workloads require different optimizations approaches - although the Compute Optimizer uses ML to determine users' workload type and match them with few instance types, users told us they still want to configure the rightsizing engine to match their workloads' type. They also want to control the observation period, the aggressiveness of peaks (with percentiles), the target utilization or resources and to be able to apply these different policies to specific apps, accounts, or any group of resources created based on tags, for example.
It is also worth mentioning that unlike Compute Optimizer, which generates and refreshes recommendations daily, Turbonomic's actions are produced continuously in real-time as demand fluctuates.
Furthermore, Turbonomic will generate sizing actions after considering users' Reserved Instances inventory. For example, if an instance is using on-demand pricing and there is an available RI, Turbonomic will generate a sizing action to that RI even if the workload itself does not need to be resized. In this example, by taking this specific action by Turbonomic, you will benefit from eliminating on-demand costs and avoiding losses related to unutilized RIs for which you have already pre-paid.
Lastly, Turbonomic is a hybrid and multicloud system; it supports on-premises workloads running on any hypervisor, private cloud, or container platforms such as Pivotal Cloud Foundry or Kubernetes. Turbonomic also supports multicloud management, including AWS and Azure IaaS and PaaS services (including managed Kubernetes and Databases), as well as Google GCP for GKE optimization.
In summary, in the future, we should expect more native cost optimization solutions offered by cloud providers as customers continue to voice their frustration with increasing cloud bills. However, we should remember that the reasons for the cloud cost overruns are rooted in the fact that many operation teams default to the old (on-premises) habits of overprovisioning resources to assure performance (it is easier), but the only way to manage the tradeoffs between performance is with Application Resource Management.