FinOps is also about Architecture

FinOps is also about Architecture

Now that the dust of the excitement about the bold PrimeVideo Tech post has settled, it is time to add my two cents hopefully with a bit more of thought.

The title draws attention, and with FinOps being the hot buzzword of the moment it is easy to make hasty conclusions: “Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%”.

The subtitle makes it even easier: “The move from a distributed microservices architecture to a monolith application helped achieve higher scale, resilience, and reduce costs”.

Unfortunately most of the reactions here in Linkedin have been sensationalist, from people that either did not read or did not understand the article. They made claims like "Not even Amazon can make sense of micro-services" or "PrimeVideo abandons micro-services in favor of a monolithic architecture".

But nothing farther from reality: the post talks about the quality monitoring tool, not the whole platform, and for that use case the monolithic architecture has provided better performance and an incredibly juicy amount of savings.

And that does not means that microservices where misused.

The post is not long and it is quite interesting, with lots of important details.

For me these are the more relevant takeaways:

Use the architecture that allows a better initial result with less effort

The initial architecture was based on and orchestrator (AWS Step Functions) and microservices (with AWS Lambda), along with object storage (AWS S3) as a data interchange space between different processes.

This allowed the developers to focus on the functionality, forget about complex infrastructure and get results fast.

FinOps takeaway: From the FinOps perspective this is focusing on quality and time-to-market, and caring about cost later (the FinOps Iron Triangle). Which works very well for the first iterations.

Re-evaluate your architecture as the service grows and evolves

If your architecture is well designed, along the path of Evolvable architectures, it will be easy to change and adapt to your new problem.

For this use case there are two main issues that prevent it from performing properly with high demand and also be very expensive:

  • A lot of state transitions on the orchestrator: This made the the system reach account hard limits for Step Functions execution, plus Step Functions are billed by state transitions.
  • Data interchange between steps of the process: The steps used S3 as a kind of shared memory to interchange data, with tier 1 objects and many API calls to interact with them, which steeply drove the cost up.

Changing to a monolithic architecture all transitions and data interchange ocurr within the same process space, with local calls and local memory, which remove both drivers of cost and underperformance.

Keep your mind open: there is not a silver bullet architecture that solves all problems perfectly, each has advantages and drawbacks that make them more appropriate for different use cases.

FinOps takeaway: Re-architecting is one of the most powerful drivers for reducing costs. No amount of Savings Plans, object storage tiering or waste removal would have achieved such high savings.

A monolith is not synonymous for bad architecture, but some basics must be observed

When we think of monolithic applications we do so with high coupling and stateful processes in mind, but that is not necessarily the case.

A monolith can be built modularly, just as this case shows: different detectors that were Lambda functions are bundled into different containers with a light orchestration layer to distribute requests.

For many other use cases they could also be stateless and easily horizontally scalable.

Some basics that come to mind, be it for microservices, miniservices or monoliths:

  • Keep architecture components loosely coupled, modular and independent
  • Usually stateless is better, at least to be able to easily scale horizontally
  • Design clear communication interfaces between components
  • Use well-known architecture patterns when possible
  • Communication with other systems should be asynchronous, with full and idempotent messages
  • Use serverless and managed services whenever possible

Remaining questions

The post shares a fair bit of detail, but some questions remain for me after reading it a few times:

  • Why are the containers not horizontally scalable?
  • If they are not, why use an ECS Cluster instead of AppRunner or something lighter?
  • What is the carbon footprint reduction that comes along with the re-architecture?

Conclusions

One of the few articles around this that makes complete sense is the one from Werner Vogels (AWS CTO), for me this paragraph sums it up describing the keys for a good architecture:

I always urge builders to consider the evolution of their systems over time and make sure the foundation is such that you can change and expand them with the minimum number of dependencies. Event-driven architectures (EDA) and microservices are a good match for that. However, if there are a set of services that always contribute to the response, have the exact same scaling and performance requirements, same security vectors, and most importantly, are managed by a single team, it is a worthwhile effort to see if combining them simplifies your architecture.
Evolvable architectures are something that we’ve taken to heart at Amazon from the very start. Re-evaluating and re-architecting our systems to meet the ever-increasing demands of our customers.”

Key takeaway: Evolvable architectures, also as a FinOps lever.

Thanks

Thanks Roberto Andradas Izquierdo for your kind review and comments on the draft.

Thanks also go to Jose Luis Prieto for pointing out Werner Vogles post and valuable insight.

Aradhya Shrivastava

DevOps Engineer | AWS | FinOps | Co-author of an anthology | Freelancer

5 个月

Informative!

回复
Joel Proctor ??

Founder & CEO @ BlueArch | AWS Optimization | Cloud Governance | Best Practices-as-a-Service (BPaaS)?

1 年

also? you mean only?

回复

Great article

回复

要查看或添加评论,请登录

Narciso Cerezo的更多文章

社区洞察

其他会员也浏览了