Cloud Costs - the painful Awakening
Customers facing exploding Cloud costs.

Cloud Costs - the painful Awakening

Hi everyone,

Today, we will look at the brewing storms on the Cloud Horizon. And, as the title could also be the title of a horror movie, it really is one for some companies.

The Cloud Cost Horror Story

A couple of years ago, someone pitched the value of the Cloud to you and explained that you must migrate everything to the large CSPs because OPEX is better than CAPEX and will definitely save money. You jumped on the train with a smile, and your company started moving with a "Cloud First" strategy to the Cloud. To not waste any time, you directly kicked off Lift & Shift migrations, which you hoped to "optimize" any time in the future. (This is, BTW, called the "Fear Of Missing Out" - phenomenon, aka. FOMO.) In parallel, you decided to transition to the DevOps methodology, and everything looked fine.

Until a time very recently

Recently, you started recognizing that the costs did and are still increasing exponentially, and though you calculated everything through, the real costs are higher than expected.

In parallel, you see the Cloud Providers significantly increasing their prices until today, 2023, like here for Azure, here for GCP, and here for AWS.

Over the past year, there has been a 23.0% increase in average prices of on-demand compute instances at AWS. Liftr Insights data show that not only did AWS increase their average prices in 2020, 2021, and 2022, but the increases have been higher each year since 2019. - see ref. above.
No alt text provided for this image
Customers′ reaction.

What happened?

Well, you fell into the trap of blindly moving Infrastructure to the Cloud and not properly increasing your IT Maturity in parallel to realize the potential cost-saving effects in the Cloud.

Another typical issue was very likely a miscalculation of your Total Cost of Ownership (TCO), not including all necessary costs and ignoring some of the requirements and dependencies which you should have set up before.

And thirdly, you also very likely moved without having a proper Governance and Automation setup or even thought about the necessity for a Cloud Operating Model, which finally led to additional costs.

Does this mean you should immediately migrate everything back to on-premises?

Gosh - no! Don′t panic!

You are currently "learning through pain." This does never feel very comfortable, but after making the first mistakes, you should take now some time to avoid follow-up ones.

No alt text provided for this image
Don′t follow up on an error with the next one.

First, I would like you to fully understand what happened and how to improve the current situation.

Secondly, running a Cloud Transformation provides additional value:

It motivates you and your teams to move to PaaS, CaaS, microservices, Dev(Sec)Ops, and an API First strategy, using modern authentications underneath that will provide long-term value to you and your company.

But, first things first.



Cloud Costs - the basics

We start with the basics to set up a common understanding. To explain the Cloud costs on a very high-level basis, the costs can be subdivided like this:

No alt text provided for this image
Simplified overview of Cloud costs.

Let us start from left to right:

  • The Platform itself can be, for example, managed by a Platform Engineering Team. This basic platform setup always runs on top of every Cloud environment, like the platform team's landing zones and platform services. For simplification, I would include all essential costs by central teams here to store and manage their own environments. Another good example would be the CyberSec Team running Pentesting or aggregating activities and requiring resources to do so.
  • The various Solutions, including their costs, should be able to map to single teams or at least to single product owners to drive cost transparency and maturity. This should always be differentiated between fixed and variable costs.
  • And the Shared costs are the costs that cannot be properly mapped to dedicated teams and will be added relatively to all teams based on their spending. Typical examples are networking, syncing, automation, and overhead costs due to centrally running services.

Each of those areas can be in different staging environments. The typically known and valuable ones are:

  • Dev - Development environments where you build stuff or playgrounds
  • Test - Testing environments to check your developed solutions
  • Int - Integration or staging environments to system testing as a replication of your production environment
  • Prod - Your production environment where you are running either customer-facing or employee-facing services.

Cost Growth Areas

In addition, there are various reasons for cost growth or cost-saving potential in Cloud environments that you should know about:

No alt text provided for this image
Cost growth areas.

From left to right:

  • Organic cost growth is the increasing costs of your existing services due to growing overhead. For a basic example, take the log and monitoring data that is growing over time as it is accumulating. It can also be a valuable reason for higher usage and adoption by customers or employees. This growth type should be continuously monitored and validated.
  • Inorganic cost growth comes from new solutions or features that are out-of-band changes. One example of existing services would be increasing the SLA significantly, leading to overhead with HA clusters and doubling the costs. New solutions can be completely new products or environments. This growth type should be typically validated with processes, running dedicated use-case validations for each "major change" or validating each new service's IT Value and the budget.
  • Lack of Optimization is simply the lack of maturity of single solutions or the whole cloud. In many environments, we can find huge potential for cost savings by running a simple Cost Optimization assessment like validating rightsizing, financial conditions like reserved instances, and much more. The more complicated tasks are savings due to architectural changes or identifying bad architectures. KPIs will help to identify these. The product teams should handle this growth type independently to build up FinOps maturity and be governed/supported by the platform engineering team.
  • Toil, as the name already refers to, includes unnecessary costs due to orphaned, forgotten, or ignored resources. This can start from unused but still running playgrounds, orphaned backup images, existing storage accounts, etc. Do not ignore the potential here. I have already seen customers saving more than 100k€ per year due to removing orphaned resources.

Knowing about these, starting your investigations in these specific areas, and creating transparency are important and the first step to getting started.

Stay tuned for the next article, where we will keep diving deeper and share valuable, actionable steps with you.


P.S.: If you feel the same pain - we are here to help you.

Our experienced consultants and I will help you to manage this situation, increase your maturity with value and reduce your costs significantly.

Best regards,

David das Neves, CEO, shiftavenue

No alt text provided for this image
If you also want to drive value, reach out to us!


Andreas L?sch

Strategist and Technical Seller | Account Management | Enterprise Digital Transformation

1 年

Couldn‘t agree more. In my eyes however the cloud is not the reason for those challenges. They already existed before with traditional on-premise IT. Due to the (desired) agility of the cloud they just become much more visible.

要查看或添加评论,请登录

shiftavenue的更多文章

社区洞察

其他会员也浏览了