Make Cloud Cost a First-Class Engineering Metric
Kiran Rane
Technology Executive | Digital Transformation | Strategy | Enterprise Architecture
Overview
According to a?Gartner report, cloud spending will reach $600B by 2023 and is expected to rise to $1.7 Trillion by 2025, greater than Canada’s GDP. Surprisingly, 32% of the Cloud spending is?wasted, contributing billions of dollars annually. As more businesses become technology companies, cloud costs become essential to the Cost of Goods Sold (COGS). Increasingly, CFOs scrutinize cloud costs, demanding better management and cost predictability.In this new consumption-based cloud economics, managing costs can be more challenging since purchasing decisions are not governed by a central IT finance or a single procurement team. It is more distributed. The developers or engineers building and running cloud solutions affect costs when writing a line of code or spinning up a new server. However, if you are like most engineers, you primarily care about developing new features and hitting critical milestones. At that level, there are no or little considerations of cloud cost.
This incentive misalignment results in frictions between the team responsible for managing the costs and the engineering team. In my own experience of driving Cloud Center of Excellence (CCoE), I have personally experienced several such pushbacks, and this is an industry-wide phenomenon. The 2022?State of Finops report?from the Linux Foundation-led FinOps Foundation reveals that 30% of their survey respondents cited getting engineers to act on cost optimization recommendations as their top challenge.
One of the significant reasons for this conflict is that we have been unable to make “Cost†a primary metric for measuring the engineering quality of a solution. The engineering team is always on the hook for several operation matrices like timelines, bugs, performance, availability, and security, but not for the cost of running the solution. However, there is a strong link between cloud cost and solution quality. I firmly believe well-architected, and better-designed solutions are cheaper and easier to maintain, while poorly designed systems are expensive and challenging to maintain. Making the cost a first-class metric to measure engineering quality can significantly improve the overall solution and make the team better aligned to business goals.
As a first-class metric, cloud cost would be continuously monitored, measured, and optimized. It would provide a clear definition of good or bad in a specific context.
Why Engineers should care about the cloud cost
There are several reasons why the engineering team should care about the cost.
Better Architectural and Design Decisions
Recently, I came across an application that was designed poorly hogging CPU and memory. Instead of remediating the issues, engineers threw more resources at it. After rearchitecting the application, we re-deployed the application with 1/8th of the cloud resources. The new application is more resilient and cost-effective. The cloud has democratized access to technology resources with an infinite capacity to scale at the developer’s fingertip. Now, they do not need many approvals from the centralized procurement or infrastructure team to spin off new servers or resources. This flexibility may also result in suboptimal solutions, resulting in wasted spending. When we introduce cost as an essential engineering metric, the developers are pressed to make better choices like caching, storage lifecycle management, serverless architecture, auto-scaling, distributed processing, etc. Often, these architectural tenants also improve several non-functional characteristics like performance, observability, maintainability, etc.
A measure of cloud maturity
The cost is an excellent indicator of cloud-native maturity as well.
I work for a mortgage lender. The industry is highly cyclical, and mortgage demand has many seasonality and a strong correlation with the interest rates. That means the loan processed by the lender varies drastically with the changing market conditions. The key selling point of the cloud is its consumption-based economics, in which the cost of goods sold (COGS) should adjust to our revenue or demand. So, if we genuinely exploit the consumption model of the cloud for a mortgage lender, cloud cost should adapt to the number of loans the lender processes. So, we plotted the unit cloud cost vs. the # loan processed by our company. Our cloud cost was not that elastic to the business demand. This finding has helped us to have an educated discussion about the need to adopt more cloud-native and serverless architecture.
Idle cloud cost is another good measure of cloud efficiency. The idle cost is your base cost for running cloud infrastructure when you have zero customers or transactions load. Based on the idle cost, you can justify and prioritize efforts such as automation to shut down the non-prod environment or migration to a serverless architecture.
Thus, having a good handle on the cloud cost and how it matters to a business outcome enables the engineering team to justify building and using better and newer technologies to the management team.
An important metric to monitor application health
Every action on the cloud has a cost associated with it, making the cost data one of the richest sources for application monitoring. An unusual spike in cloud costs can be an indicator of any denial-of-service attacks. Crypto mining attacks increased?19% year over year?in 2021. In its?2021 cybersecurity report, Google Cloud said that 86% of compromised cloud instances were used for crypto mining. Real-time visibility on the cloud cost provides an additional tool to monitor any anomaly and empowers developers to connect case and effect and take swift actions in any such situations.
Partner effectively with the business teams
The engineering team builds applications, features, and services to support customers, and, in turn, help drive the organization’s revenue and profit. Measuring the return on investment (ROI) of the new product, application, or feature is essential.
领英推è
With digital business models enabled by cloud computing, the cost of the cloud becomes a core component of the COGS. So, it is essential to answer questions like the cost/value ratio of a feature, the cost of acquiring a new customer?, etc.
Collecting detailed cost data and correlating to such business KPIs raise the team performance when cloud solution decisions and related cost implications are tied to business value. Now, engineering teams can have a well-informed conversation with business on feature prioritization and other technology investment decisions. It can also justify cloud spending in business terms, building bridges with business and finance teams.
Career Growth opportunities
As cloud adoption and spending will continuously rise in the future, there will be a great demand for talent with cloud financial management skills. The state of?the FinOps reports?predicts that teams working in cloud cost management would grow by 40% in the near term, with several organizations making it a priority.
Approach to integrating the “Cost†perspective into an SDLC
Integrating cost management with the organization’s Software Development Lifecycle is crucial to making cost the first-class metric.
Plan Stage
In the planning stage, teams should be able to influence the product roadmap and its prioritization using cost data. For a new initiative, the team can use unit costs to help to calculate ROI. It can also justify budget allocation to engineering initiatives to eliminate technical debt or improve technology maturity.
Design and Build Stage
In the design and build stage, teams should have the data needed to make cost-effective architecture and solution design decisions. The team should define, maintain, and follow best practices for cost management. The engineering teams should collaborate with the architects to have trade-off discussions about the choices of different cloud services, and their suitability to solve the functional and non-functional characteristics. The team should build a cost estimate of their cloud choices and validate it against the business case defined in the planning stage. The cost goals should also lead to appropriate automation stories to curtail costs, e.g., developing scripts to automate the shut-down of non-prod environments in non-working hours, sending cost alerts, or policy automation.
Deploy and Operate Stage
The engineering team should work closely with finance and business teams to define critical KPIs that should be measured and monitored. During this stage, appropriate observability should be built to monitor and track the cloud cost. Relevant reports and dashboards should be created and made available to stakeholders. An operational runbook or processes should be defined to detect, investigate, and remediate any cost anomalies or unpredicted spending. If the new spend is justified, then the team should appropriately update the forecasts and budget alerts for that specific application or resource group.
Monitor Stage
This phase should allocate the collected cost to appropriate business units, teams, products, and features. Using this data, the cross-functional team should evaluate ROIs and perform trend analysis on the unit cost and other critical KPIs. Based on the findings, create a list of action items or backlog to feed into the next iteration of the planning phase.
Summary
Cloud Financial Management is also a change management effort, and this requires better aligning the engineering team with several cross-functional teams like finance, operation, and business. We are not building and running better systems by bringing development and operation organizations together in DevOps. It is time to take it to the next level by integrating DevOps with cloud financial operation (FinOps), making the cost a first-class engineering metric.
MBA, Engineer | Enterprise AI | Advanced Analytics | GTM Strategy | World's First Arbor Essbase Post-Sales Consultant
1 å¹´Thank you for sharing Kiran!