Cloud Journey - Part 10 | FinOps
Cloud Journey Series:
Book Recommendation
Cloud FinOps: Collaborative, Real-Time Cloud Financial Management is a great book by by J.R. Storment and Mike Fuller. FinOps brings financial accountability to the variable spend model of cloud. Used by the majority of global enterprises, this management practice has grown from a fringe activity to the de facto discipline managing cloud spend. In this book, authors J.R. Storment and Mike Fuller outline the process of building a culture of cloud FinOps by drawing on real-world successes and failures of large-scale cloud spenders.
Engineering and finance teams, executives, and FinOps practitioners alike will learn how to build an efficient and effective FinOps machine for data-driven cloud value decision-making. Complete with a road map to get you started, this revised second edition includes new chapters that cover forecasting, sustainability, and connectivity to other frameworks.
FinOps Principles
By adhering to these principles, FinOps teams can create a cost-conscious, self-governing culture within their firms that encourages cost responsibility and business agility, allowing them to better control and optimize costs while preserving the cloud's benefits for innovation and velocity. These values for FinOps are:
(1) Collaboration as a Cross Functional Team:
(2) Decisions are driven by the business value of cloud not just technology
(3) Everyone takes ownership of their cloud usage.
(4) FinOps reports needs to be democratized.
(5) FinOps Chapter.
(6) Take advantage of the variable cost model of the cloud.
A New Way of Working Together
FinOps model requires a cross-functional team manages the cloud strategy, governance, and best practices and then works with the rest of the business to transform how the cloud is used.
This cultural shift also enables those in leadership positions to have input into decision making in a way they currently don’t. Based on leadership input, teams make informed choices about whether they are focused solely on innovation, speed of delivery, or cost of service. Some teams go all-in on one area with a growth-at-all- costs mindset. Eventually the cloud bill gets too big and they have to start thinking about growth and cost together. For example, “Move fast, but keep our cost per customer transaction below $0.45.”
Adoption of a FinOps
When proposing the adoption of a FinOps function within an organization, brief a variety of personas among the executive team (engineering leadership, finance leadership, etc.) to gain approval, buy-in, and involvement in conducting FinOps and achieving its goals.
Each executive team persona is described below, in terms of their goals, concerns, key messaging, and useful KPIs. By understanding the motivations of each executive persona, a FinOps champion will be able to describe the value of FinOps more effectively, minimizing the time and effort to gain alignment. You can read more on Personas in FinOps https://www.finops.org/framework/personas/
The FinOps Framework provides the operating model for how to establish and excel in the practice of FinOps. Like FinOps, the Framework is evolving and informed by community experiences, contributions, and conversations. It’s built by the community, for the community. You can read more: https://www.finops.org/framework/
Rate Optimization
As you know that Cloud Cost = Rates × Usage this section focus on the other half of that equation, and cover how to optimize rates to pay less for the resources you continue to use. Reservations, Savings Plans (SPs), Reserved Instances (RIs), Committed Use Discounts (CUDs), and Flexible CUDs are the primary levers for adjusting rates for many services, but they can be quite complex.
Commitment-Based Discounts
Reserved Instances (RIs), Savings Plans (SPs), Committed Use Discounts (CUDs), and Flexible Committed Use Discounts (Flexible CUDs), collectively known as commitment-based discounts, are the most popular and important cost optimizations that cloud service providers offer. This is because commitment-based discounts represent the largest percentage discount you can achieve in cloud and often apply to the largest areas of cloud spend in your bill.
Some years ago, during a webinar with AWS, Cloudability, and Adobe on the power of RIs, Adobe showed below figure, indicating that the company had cut its EC2 spending by 60% simply by purchasing RIs.
Commitment-based discounts are most often applied to individual resources in a nondeterministic way. In the case of AWS SPs and Azure SPs, they are applied where they will have the biggest savings, but you can’t pick which specific resources they will be applied against. Google Cloud lets you choose how to attribute the discount credits and fees. They give you three options: Unattributed, Proportional Attribution, or Prioritized Attributions. The right attribution model to choose depends whether your organization purchases and manages CUDs in a centralized or decentralized way. An analogy might be useful here to help you better understand reservations or commitments.
Say a specific restaurant is running a deal where you buy a book of coupons. Each coupon gives you a meal at that specific restaurant. The book contains one coupon for every day of the month. When you eat at the specified restaurant, you pay for the meal with the coupon. Deciding to eat somewhere else means that you forfeit that day’s coupon and pay full price on the meal at another establishment. Let’s say the book costs $750 and contains 30 meal coupons, where each coupon gets you a meal that, if bought without a coupon, would cost $50. Divided out, that’s $25 per coupon for a $50 meal, saving you $25 a day. If you eat at this restaurant every day, you save 50%, and if you eat there only half of the days, you’ve saved nothing. If you use more than half of the coupons, you’re better off buying the book of coupons.
If you apply this idea to RIs, once you’ve decided on the length of time you want to reserve, you purchase a reservation (book of coupons) from the cloud service provider, matching a particular resource type and region (specific restaurant meal at a certain location). This reservation will allow you to run the matching resource every hour (or second or millisecond). If you don’t run any resources matching the reservation, you forfeit the savings. As long as you have enough resource usage during the reservation term, you benefit from the discounts—and you save money.
The key takeaways here are:
More recently, AWS and Azure have offered On-Demand Capacity Reservations (ODCRs), which allow you to perform capacity reservations separately from RIs. Google Cloud offers a similar concept called Compute Engine zonal resources, which provide a very high level of assurance in obtaining capacity for a specific VM type. Note that when you’re using on-demand capacity reservations in combination with AWS RIs, it’s essential to set your RIs as regional so the RIs can discount the capacity reservation.
People commonly misunderstand how RI sharing works. To ensure you get it right, take a look at the example below:
In the example layout in above figure, the following may occur:
EC2 RIs are purchased to match a particular instance size (small, medium, large, xlarge, etc.). The RI will apply discounts to resources matching its size. However, for regional Linux/UNIX RIs, you benefit from the feature mentioned earlier, called instance size flexibility (ISF). Reservations you currently own or are planning to buy with attributes of Linux, regional, and shared tenancy will automatically be an ISF RI.
ISF allows the RI to apply discounts to different size instances in the same family (m5, c5, r4, etc.). A single large RI can apply discounts to multiple smaller instances, and a single small RI can apply a partial discount to a large instance. ISF gives you the flexibility to change the size of your instances without losing the discount applied by an RI. Because you don’t need to be specific about the exact size of the RI for it to cover all your different size instances, you can bundle up your RI purchases into small variations of parameters.
The figure in above illustrates a way of thinking about how ISF can be applied. Each column represents the instances that were run in a given hour. For r5 and c5 instances, large (L) is the smallest instance size. If you aim for 100% RI utilization in this example, you’d purchase 29 large RIs. Think of it as purchasing LEGO blocks at the smallest size within a family and combining them to cover larger instance sizes you are actually running. This would mean you’d have only one size of RI you purchase that’s well matched to your overall usage. If instance sizes fluctuate but normalized usage stays the same or increases overall, your RIs will remain perfectly utilized.
The normalization factor in above table shows how to convert between instance sizes within a family. For example, if you own an RI that’s 2xlarge (16 units), that RI could apply instead to two xlarge (8 units each) or four large (4 units each) instances. You could also use an RI that is 2xlarge (16 units) to apply to one xlarge (8 units) and two large (4 units each) instances.
Some instance families do not have all the sizes of instances, so, if purchasing ISF RIs, just look for the smallest instance size in the family.
In late 2019, Amazon Web Services announced Savings Plans, initially offering dis‐ counts on EC2 instances, Lambda, and Fargate, the managed service of Amazon’s proprietary container service. Since the initial release, AWS has added an additional SP offering that applies to the AWS machine learning service SageMaker. The addition of the machine learning SP has driven up demand from AWS customers for AWS to release further SPs to cover other services, such as database and storage. Based on this customer demand, we expect additional offerings to be announced after this edition of the book is released. A large amount of what you learned here about RIs applies to SPs as well. AWS has continued the purchasing options of All upfront, Partial upfront, and No upfront payment and one- and three-year durations.
However, the biggest difference between SPs and RIs is that while RI commitments are for resource units (numbers of EC2 instances of certain sizes), SP commitments are monetary (the amount a customer is committing to spend on discounted compute or other covered services). SPs are offered in three plan types:
Compute SPs
Apply broadly across EC2, Lambda, and Fargate compute, offering savings comparable to CRIs. This plan type applies more widely than a CRI—to include compute resources in any region—which will lower the amount of effort in maintaining high plan utilization. These generally return the same savings rate as CRIs.
EC2 Instance SPs
Apply to EC2 usage of a single family in a single region in any size or other configuration. While this plan type is more restrictive than a Compute SP, it offers higher discounts and is less restrictive than an SRI, while providing the same discount as SRIs.
Machine Learning SPs
Sticking with the cost/hour model of SPs, the machine learning SP for SageMaker enables AWS customers to commit to spend on SageMaker in return for reduced rates on component costs of SageMaker, including those on:
There are, however, a few major differences between RIs and SPs:
AWS will apply SP coverage to the resources that give you the greatest discount, which is nice because not all types of compute or instances are discounted by the same amount. This also means that, as you get more and more coverage, the increase in discount percentage you are receiving will get smaller.
Azure SPs operate as discounts to usage hours over a period of time. You save money by committing to a fixed hourly cost on compute services over a one- or three-year term. Azure advertises savings of up to 65% from pay-as-you-go prices.
Let’s look at how CUD billing and application works within the Google concept of organizations, folders, and projects. Following figure shows the structure of the hierarchy
There are similarities to AWS’s concept of a management account structure. A billing account can share CUDs across all of its connected projects, once you’ve enabled CUD sharing on the account. This means CUDs offer both the flexibility of applying to a variety of machine types and the flexibility of applying to multiple projects. Unlike AWS, where RIs are automatically shared across multiple linked accounts, CUDs are shared only after you’ve enabled the feature for your billing account.
This gives you the flexibility to choose what’s most important for your account: cleaner chargeback or maximum waste reduction. If you want to allocate funds from a particular project to make the commitment, turning CUD sharing off will offer cleaner chargeback options than the way AWS handles RIs. However, turning CUD sharing on could result in less waste because unused commitments will be shared across multiple projects.
CUDs give you the option to choose the sharing model that works best for your organization. For an organization with a lot of projects, turning sharing on allows you to take advantage of the economies of scale of the entire organization.
Steps to Building a Commitment-Based Discount Strategy
There are six key steps in building your first commitment strategy:
Figure below shows you the accumulated cost of a resource, for both on-demand rate and commitment rate, over a one-year term. In the example, the commitment costs over $300 up front, which is why the commitment line starts above $300. If you had used a No upfront commitment, then it would begin below the on-demand line. And if you used an All upfront commitment, it would be drawn as a flat line at the value of the up-front cost.
The cash flow break-even point is the date on which the commitment has cost you the same amount as if you had been using on-demand rates. Some people call the cash flow break-even point the crossover point, for obvious reasons. You’re no longer out-of-pocket for the commitment at the cash flow break-even point. However, you will continue to pay the ongoing (hourly) costs of the commitment. If you stop your usage at the cash flow break-even point, it will result in you losing money versus on-demand rates.
The total committed cost break-even point is where the on-demand cost of resource usage is more than the total cost of the commitment. This is the real break-even point, since you could cease to run any usage that matches the commitment and you would be no worse off than if you had not committed to the discount program. You would have paid the same amount on-demand versus with the commitment. After the break-even point, you realize savings from the commitment.
The difference between the on-demand line and the total cost break-even point is the savings (or loss) made by the commitment. It’s essential to understand that with this graph it doesn’t matter whether you run one resource that matches the commitment for the whole 12 months or you run many different matching resources during that period.
If you add up all the usage to which your commit applies a discount, you can then compare the total cost of the commitment versus what you would have paid for the same amount of usage at on-demand rates. This comparison will give you an indication of when you have met your break-even point and of the amount of savings you have realized.
Cloud is designed for just-in-time purchasing. You shouldn’t be running infrastructure or purchasing capacity (i.e., buying commitments) in a data center fashion
.Years ago, AWS had graphs like those in above on their home page. The graph on the left showed the way you had to buy capacity in the data center and hardware world, fraught with constraints and long lead times for hardware. The graph on the right shows the ideal way to run infrastructure in the cloud: you spin it up just when you need it. The same is true for purchasing commitments.
FinOps Certified Platform
Tier of technology providers that license and deliver a software product, or are founders/maintainers of an open-source project, to help people successfully adopt cloud financial management practices aligned with the FinOps standards.
Anodot seamlessly combines all of your cloud spend into a single platform. Monitor and optimize your cloud cost and resource utilization across AWS, GCP, and Azure. Deep dive into your data details and get a clear picture of how your infrastructure and economies are changing.
Founded in 2007, Apptio is the leading provider of cloud-based IT financial management software. Our applications connect technology investments to business priorities, engage business stakeholders to drive cross-functional accountability, and improve the efficiency of hybrid IT resources. Cloudability by Apptio is the original FinOps solution.
Simba Innovation is a global Next Generation Cloud Managed Services and Cloud Native Services Provider. Simba Innovation partners with AWS, Azure and other public cloud providers focus on helping customers with high-level global services, such as cloud consulting, migration and deployment, and professional services for cloud.
Designed and founded by engineers in 2015, Yotascale optimizes the world’s cloud computing spend, making cloud computing profitable and sustainable, for every organization. It creates cloud cost visibility and enables resource transparency by empowering engineering teams. Yotascale’s next-generation cloud cost management solution identifies resource waste, enables cross-functional collaboration, improves optimization by 5x, and reduces yearly costs by 50%.
You can find more tools and vendors in here: https://www.finops.org/certifications/finops-certified-platform/