Bigger context
- As cloud adoption progresses, addressing costs from both underutilised resources and wasteful resource usage has become essential. This is emphasised by the 2024 State of FinOps survey, which identifies reducing waste and underutilised resources as a top priority.
- Simple, unintended errors in design, code, and configuration can lead to significant monthly expenses, which can be challenging to identify and resolve later in the product life-cycle.
- At present, engineering teams lack incentives to prioritise cost optimization.
Existing alternatives
- Product teams are tasked with balancing cost, speed, and features to maximise ROI. However, they often overlook infrastructure costs, which can account for around 50% in a cloud environment. As a result, infrastructure costs are typically treated as an afterthought rather than a core consideration.
- A centralised Cloud Center of Excellence (CoE) or FinOps team reviews costs at the end of each day or month and then engages with individual teams to address optimization. However, the Cloud CoE team lacks insights into the specific usage patterns and features of each product, leading to multiple rounds of back-and-forth communication to resolve the issues.
- Vendor or cloud-native optimisation tools primarily focus on recommending changes for underutilised resources, but they often overlook situations where resource usage is inefficient or wasteful.
So the idea of a centralized Cloud COE/finOps team driving the cost initiative is manual, reactive, inefficient and not scalable.
Solution
- The product team takes ownership of cloud cost management by creating cost estimates during the project planning phase.
- During the build phase, the product team refines these estimates by conducting cost tests.
- They monitor cost estimate burn-down rates and identify cost anomalies, creating cost incidents when necessary.
- The product team analyses cost incidents, then either adds tasks to the engineering backlog for resolution or updates the cost estimates accordingly.
What role should the Product team play in a cost-aware product cycle?
- Product Owners are responsible for managing the product's budget, time-to-market, and feature set.
- In addition to estimating story points or person hours, they must estimate the infrastructure costs for all environments, document the commitment, and define the serverless/IaaS strategy.
- Provide the cloud cost estimate to the FinOps team to help establish the overall forecast.
- Approve the cost test strategy and its execution for QA.
- Model the cloud cost estimate based on cost data provided by the QA Engineer.
- Monitor the cloud cost burn rate for any anomalies.
- Analyse cost incidents to determine if they require fixes or if the estimate needs be updated.
What role should the finOps team play in a cost-aware product cycle?
- Establish governance over which types of cloud services can be used across the enterprise. By controlling committed usage, avoiding unnecessary costs (e.g., runaway usage or inter-region data transfer fees), the FinOps team can help stabilize costs while allowing product teams to make informed decisions.
- Implement a tagging policy that enables charge-back at the application or transaction level. This should be mandatory to ensure accurate cost allocation and accountability.
- Mandate cost estimates for every project during the planning phase, with proper oversight to ensure accuracy and alignment with budget expectations.
- Encourage product teams to commit to cloud services for longer periods to leverage rate optimization and secure better pricing during contract negotiations.
- Treat any deviations from estimated cloud costs as incidents, requiring root cause analysis (RCA) and timely resolution.
What role should the engineering team play in a cost-aware product cycle?
- The design should be evaluated to ensure optimal resource usage, considering factors like serverless vs. dedicated infrastructure, internet, inter-region, and inter-zone data transfer costs, data archiving policies, and autoscaling capabilities.
- Code should be optimized to minimize resource consumption and maximize efficiency.
- Implement granular tagging to facilitate accurate chargeback tracking for specific applications or transactions.
- Ensure timely resolution of cost-related incidents in the backlog to optimize overall cost management.
What role should the QA team play in a cost-aware product cycle?
- Develop a strategy to model various environments and their corresponding workloads for accurate cost analysis.
- Set up the necessary data, configure the test environments, and enable billing data collection for cost test execution.
- Run the tests as outlined in the strategy to simulate real-world workloads and gather relevant data.
- Collect and analyse cloud billing data to generate precise cloud cost estimates based on the simulated workloads.
What role should the Infrastructure team play in a cost-aware product cycle?
- Assist FinOps in developing governance policies for cloud cost management.
- Help FinOps establish a tagging policy to enable chargeback at the application or transaction level.
- Work with FinOps to consolidate resource commitments from individual product teams on cloud services.
- Support the product team in estimating cloud costs and creating cloud infrastructure based on the product owner's requirements.
I think a Cloud COE/Finops team has it's place in providing the tools to measure inefficiency and best practices to address those.