Cloud Cost Management, Optimization & Savings Strategies

Cloud Cost Management, Optimization & Savings Strategies

Introduction

Companies operating in the cloud face a constant challenge of trying to control cloud cost growth while maintaining optimal levels of operational performance for internal applications and customers. The goal of cloud cost management is to align costs with actual needs without compromising on service quality or performance, typically by limiting expenses such as overprovisioned resources, unused instances, or inefficient architecture. It is a balancing act between keeping costs down and providing the appropriate cloud resources to maintain peak performance, fuel growth, and ensure compliance and data security.

A crucial benefit of cloud computing is the ability to add servers, storage, and networking capacity quickly and easily to respond to usage demands. Cloud cost optimization helps companies control cloud costs and improve budgeting, forecasting, and IT performance. Best practices for cloud cost optimization include setting strict budgets and using automated tools to identify and adjust cloud resources in the moment. There are a variety of strategies that are utilized to reduce cloud costs. These strategies are best implemented in a gradual manner and in conjunction with each other. (Please see my other article on Cloud Data Transfer & Storage Cost Reduction Strategies)

The 6 Rs of Cloud Migration

To get a better understanding of cloud cost savings strategies it is important to understand how applications are migrated to the cloud in the first place. During this cloud migration process process, cloud operations teams need to evaluate various options and decide on the most effective method given operational, technical, engineering, design, time, financial and personnel constraints There are several major ways to migrate applications to the cloud that are commonly referred to as the six Rs of cloud migration which can be summarized as follows:

Cloud Costs Saving Strategies

When cloud resources are first migrated and often deployed their capacity requirements are not fully known or well understood. This uncertainty leads companies to overprovision CPU, storage, networking, and other cloud resource capacity to ensure there will not be any performance issues. As a company’s cloud assets operate over time Cloud Ops team need to monitor their cloud resources and to adjust to improve operational efficiency and and reduce costs. Changes in customer behavior can result in reduced capacity requirements. Cloud cost management best practices have historically focused on finding every opportunity to cut costs. Nowadays, it focuses on optimizing cloud usage to minimize costs and maximize returns. ?Below are twelve cloud-cost saving strategies that Cloud Ops and FinOps teams can implement that can help reduce costs and optimize resources long-term:

1) Re-Evaluate Pricing Plans Options

Cloud pricing has become increasingly complicated, which can cause companies to inadvertently overspend on unnecessary resources. ?Review pricing and billing information for anomalies. Companies should continuously re-valuate their pricing plan option to reduce their cloud spend on a short-term and long-term basis. Some of the more common practices include:

  • Reserved Instances (RI): Reserved instances (RIs) can deliver substantial pricing discounts if companies commit to using specific instance types over a defined period, typically one to three years. For workloads with predictable, steady demand, committing to long-term reservations (1 or 3 years) can reduce costs by up to 75%. Because RIs are billed up front, companies should have a clear understanding of their long-term usage patterns to determine accurate commitments. Savings Plans: Savings plans are commitments to spending, measured per hour, regardless of instance type or region. Savings plans can offer more flexibility than RIs, which are commitments to capacity levels and specific instance types. As a result, savings plans make more sense for companies that expect a certain amount of cloud spending but whose needs are likely to change. AWS and other providers offer savings plans that provide discounts for consistent usage over a defined period, without the need to commit to specific instance types.
  • Assess Workload Predictability: Many cloud hosting providers offer volume discounts for larger customers which have resources or workloads such as production systems that run in a long-term predictable manner. ?
  • Spot or Preemptible Instances: These are unused capacity offered at steep discounts (up to 90%) but can be interrupted by the provider when demand increases. Spot instance discounts fluctuate depending on availability and demand. Thus, there is no way to predict if or when spot instances will become available or whether a bid will be accepted.
  • Non-Critical Workloads: Spot instances are ideal for batch processing, testing environments, or applications that can tolerate interruptions.

2)?Orphaned Resources

When a virtual machine is deleted, there are often secondary attached cloud resources such as storage drives, network interfaces, and Public IPs that are left in place. These resources are called Orphaned Resources as the primary resource that they were attached to is no longer active so these cannot be used anymore. However, the company is still charged for the orphaned resources even though they are no longer used. A Cloud Ops team needs to audit, identify and delete orphaned resources on a regular basis by looking for areas with very stable costs and that use dated resources.

The big three cloud providers have native tools to help identify orphaned resources including the following (2):

3)?Virtual Machine (VM) Version Upgrades

Every cloud hosting provider regularly upgrades the specifications of their data centers and cloud infrastructure to use the latest technology including newer generation of CPUs, storage, and RAM offerings. During the upgrade process firms usually use virtual machines to keep older versions of cloud assets up and running to avoid any disruptions to customers during the overall upgrade process. With virtual machines a cloud ops team can upgrade the version to the latest generation if the VM specs, and the workloads can support the change. Newer generation virtual machines offer faster CPUs, with more efficient chips that utilize less energy and offer more bandwidth. There are a few things to consider before upgrading versions (2):

  • All upgrades need to be done only during maintenance windows.
  • Make sure that all the requirements such as accelerated networking and maximum number of NICs are also available on the newer version of the VM or you will not be able to upgrade.
  • Not all new generations are available for all locations due to technical limitations.
  • If your company has Saving Plans or Reserved Instances in place, changing a generation may affect current reservations usage.

4)?Usage Reduction by Resizing (Rightsizing)

Adjust resource allocation by continuously evaluating whether allocated resources are fully utilized. Overprovisioning leads to unnecessary costs. Monitoring resource utilization using cloud-native tools like AWS CloudWatch or Azure Monitor. Or you can use external third-party software packages such as CloudZero, IBM Turbonomic, CloudHealth or Apptio Cloudability to monitor resource usage and identify underutilized instances (e.g., oversized VMs or idle databases). To resize a resource, you need a thorough understanding of how much of the resource is utilized. Cloud Ops teams need visibility of cloud resources including CPU usage, memory utilization, network throughput, and storage utilization. For instance, when working with virtual machine resources they need to be properly sized based on the resources need requirements. There are three general rightsizing strategies that cloud ops teams can utilize:

  1. Sometimes it is better to fully Terminate or delete the instance if its use is not justified or if it is no longer being utilized or from an operational standpoint is very inefficient.
  2. If usage metrics show that a resource usage is under 25% then it may be appropriate for Downsizing or Scale Down which will result in cost savings. In other cases, you can scale down to another VM family that offers better pricing with smaller computing capacity.
  3. If a firm has VMs that have errors or failures on a regular basis then they may be prime candidates for Upgrade or Scale Ups to better CPUs that offer more robust performance capabilities usually at increased costs.

The more common rightsizing mistakes include the following (1):

  • Firms do not fully simulate their performance needs before rightsizing. Before making a rightsizing decision, it is best to simulate the impact of each rightsizing option and consider multiple options across diverse compute families.
  • Not addressing resource "Shape" is another area where firms sometimes do not do well when resizing. It is important to match new resources to the shape of your applications or workload.
  • Firms only focus on resizing compute and fail to focus on non-compute resources such as databases and storage assets.
  • Relying on recommendations that use only peaks or averages metrics can mislead you as to what an appropriate resizing level is.

5)?Usage Reduction by Redesigning Underlying Software

The most complex method of usage reduction is to redesign the services themselves. Having company engineering teams modify the way software is deployed, rewrite applications, or even change the software altogether can help you take advantage of cloud native offerings.

6)?Scaling

This strategy works by adjusting your computing capacity to match your workload's requirements. It involves checking factors like CPU, memory, and network bandwidth. With autoscaling, you can increase cloud resources when your workload spikes, and it works the same way downwards. In short, it automates the adjustment, so you save money without doing anything. One of the biggest advantages of operating in the cloud is the ability to rapidly scale up computing resources as needed. This helps reduce costs during off-peak times. However, this ability can be costly so companies need to balance operational needs and what they can afford to spend.

There are two different methods for scaling (2):

1)??Horizontal scaling, which is also called Scaling In/Out is a process of adding more VMs to a pool that executes the same processes or runs the same applications or workloads, distributing the work among nodes in the pools. Horizontal scaling is a great way to reduce waste, as additional resources are added when they are needed. ?After the period of high demand is over the additional VMs can be removed from the pool to reduce costs.

2) Vertical scaling, also called Scaling Up/Down consists of upgrading or downgrading the compute specifications of the virtual machine. Vertical scaling is often better for traditional or legacy solutions, where one server does the heavy lifting for web applications, databases, and similar workloads.

7)?Family Standardization

As a firm’s cloud profiles grow sometimes, they develop a large collection of diverse CPUs and VM types in their environments that can lead to operational and cost inefficiencies. VM and CPU family standardization is important as having common families of VM allows for greater savings when using Reserved Instances and Savings Plans. Companies should establish VM family standards and enforce them across multiple cloud platform providers to ensure operational consistencies across their platforms.

8)?Shutdown Idle Resources & Power Scheduling

Non-production environments (such as development & testing) are often only required during working hours. Cloud-native tools like AWS Instance Scheduler or scripts can be utilized to automatically shut down resources during off-hours. Cloud costs on CPUs and virtual machines depend on the amount of time that they are running and the rate that is charged when they run. Cloud Ops team should be using automated software that schedules VMs and that turns off virtual machines in the evening and on weekends when they are not utilized, which can result in great savings.

9)?Data Retention Policies

The ease of storing data in the cloud theoretically is limitless which makes it easy to store data forever. Pre-cloud companies’ data center usually monitored available disk space as a constant constraint and had rigorous data retention policies in place. Companies operating in the cloud need to have active data retention policies that are routinely updated to reflect changes in customer and internal data storage needs.

10) Utilize Storage Tier That Matches Data Needs

By default, most cloud providers store data in the standard classes or what is called hot storage. Standard storage is usually the most expensive storage class as data stored in this class can be rapidly retrieved on demand. If a company’s data is not needed as quickly then moving it to a lower tier where it cannot be retrieved as quickly can result in significant savings. Tiered storage solutions use cost-effective storage classes based on data access patterns (e.g., AWS S3 offers Standard, Infrequent Access, and Glacier tiers). One must be careful though in not moving critical data to a storage tier that will take too long to retrieve that has operational impact. Cloud hosting providers do charge for moving data from one tier to the next and for data that spends only a brief period in a tier. Companies should utilize specialized software that moves and stores data based on pre-defined rules.

11) Optimize Network Routes

Data transfers between regions, cloud providers, or out of the cloud (egress) can be expensive. Cloud Ops teams should consider using the same cloud region for interconnected services or optimizing the architecture to reduce data movement (1):

  • Cloud hosting providers offer Networking Constructs (like AWS VPC endpoints) that enable access to their service APIs without using a public IP address or routing traffic via network address translation (NAT) services. Using these cloud hosting provider constructs can reduce the cost of transferring data between your applications and the cloud services in use.
  • Firms can also use Content Delivery Network (CDN) like AWS CloudFront or Azure CDN to cache and distribute content close to end-users, reducing bandwidth costs.

12)?Use Containers and Kubernetes

Containerization with services like Amazon ECS, Azure Kubernetes Service, or Google Kubernetes Engine allows for denser resource utilization compared to VMs, resulting in cost savings. Containers decouple applications from the underlying host infrastructure. This makes deployment easier and cheaper long-term in different cloud or OS environments. Each node in a Kubernetes cluster runs the containers that form the pods assigned to that node. Containers in a pod are co-located and co-scheduled to run on the same nod. The benefits of containers include:

  • Containers require less system resources and overhead than traditional or hardware VMs because they do not include operating system images
  • Increased ability to run on multiple cloud hosting environments
  • Greater efficiency
  • More consistent operations
  • Better application development

Kubernetes automates operational tasks of container management and includes built-in commands for deploying applications, rolling out changes to your applications, scaling your applications up and down to fit changing needs, monitoring your applications, making it easier to manage applications. Kubernetes services?let you grow without needing to rearchitect your infrastructure. Kubernetes save time and money for Cloud Ops teams.

Conclusion

Cloud Ops teams need to constantly monitor their cloud resources and to make adjustments to improve operational efficiencies and to reduce costs. There are a variety of cloud cost saving strategies that can be used in conjunction with each other to help companies reduce ever increasing cloud spend. The best companies actively monitor their cloud resources and proactively look for opportunities to adjust their primary and secondary cloud resources to reduce their overall cloud expenditure and utilize a variety of the strategies mentioned.

?

?

Basetsana Mathole

Data Solutions Specialist | Gen AI | Data Pipelines | Insights & Analytics

1 个月

Very insightful read. Just interested to know, do you have any pieces on cloud repatriation and the cheeky costs that come with that?

回复
Susan Stewart

Sales Executive at HINTEX

1 个月

This is such an important topic for companies using cloud services!

回复
Miles Welch

CEO @ North Star Training Solutions | 1000+ CEOs/Execs/Directors coached | I build your leadership bench so you can focus on building your business.

1 个月

Cloud cost management is crucial for balancing expenses and performance. What strategies do you think are most effective?

回复

要查看或添加评论,请登录

Brandon Pfeffer, CMA的更多文章

社区洞察

其他会员也浏览了