登录查看更多内容

The Cloud Architecture of the Next 10 Years

Ala Shiban

Co-Founder & CTO at Klotho | Group Product Manager | Cloud & Dev Tools | ex-Microsoft | ex-Riot

发布日期: 2022年1月6日

+ 关注

(cross-posted from the Klotho blog)

It’s 3pm PST on June 2nd, 2020 and the world is watching. Riot’s new game VALORANT is about to launch.

Hundreds of engineers, architects, operators, product managers, designers, and PR folks are standing by. Any large-scale all-at-once launch is nerve-wracking, but our infrastructure team of around 120 people is ready. We go live.

Players immediately begin showing up, just like we had expected, modeled, and tested for. Instead of fire-fighting, the infrastructure team goes about their normal tasks as all green metrics stream in around us.

My name is Ala Shiban. I led the Cloud Services group at Riot Games, a centralized technology team which enabled hundreds of engineers to ship multiple new large-scale live services for over 180 million users around the globe. It took us 3 years, 50 people, and many cross-company alignment efforts, and we built a highly effective, proprietary, global hybrid cloud platform. It had the ability to describe a large set of microservices as a single versionable package that could be configured, deployed, and operated on top of our platform running in heterogeneous clouds. It also meant we could run the entire set of services on Riot’s 12+ data centers around the world, including AWS and Tencent in China.

When I left Riot, I looked back at all the great work we had done, and I thought to myself:

“We succeeded in such incredible ways… and we shouldn’t have needed to do any of it.”

Streamlining Complexity

Computer engineering has always been pushed to its limits to enable larger and more ambitious dreams. Cloud computing is no exception; parallel computing, cluster computing, grid computing, edge computing are all constantly expanding what we think of as possible. Which simultaneously makes it harder to develop against.?

The industry is now in this streamlining complexity phase of cloud computing. The most evident examples are integrated solutions that optimize for certain workloads or development models: Google’s Anthos, AWS Outposts, Azure Stack Hub or the Hashistack.?

Those solutions bundle together building blocks necessary for larger scale applications and systems, but are complicated low-level interfaces that need developers and operators to configure, learn, assemble and scale appropriately.??

There’s a similar complexity reduction evolution happening in programming languages: Punch cards, assembly, C, C++, Java … Continuous improvement keeps happening, but at some point, an architecture shift emerges that tackles the accumulation of complexity.??

I’d like to take a look at a few principles that I view as critical for this architectural shift to emerge, and what we need from products to effectively take us into the new world of cloud computing.

Guiding Principles

To effectively reduce complexity, we need to absorb it outside of the purview of the developer or operator… not pass it around like a hot potato. Let’s take a look at a principled approach that should properly address complexity by design.

A solution should:

maintain benefits from existing architectures
keep tools and programming languages usable?
integrates with an ecosystem instead of trying to replace it
ensure user code is recognizable, debuggable, and patchable–even in production

Maintain benefits from existing architectures

Incidents are part of any live system, distributed or not, and no company is immune. But in a monolithic world, observing and tracing application-level incidents tends to be more straightforward due to centralized instrumentation and the availability of cross-API context.?

In one of the teams I worked on, it took on average several months to get a new developer onboarded and productive on a microservices based architecture. An SDK team is put into place to simplify the process, but they’re soon overrun by reasonable requests by feature teams. Papercuts increase over time as smaller features are harder to prioritize. The more adoption of the platform, the worse the problems become.

Everything is an evaluation of existing tradeoffs.

Microservices make it easy to have fault isolation, independent deployments, custom per-service environments and modular code, as well as team boundaries.?

Monoliths make it easier to be productive, deploy and test features, trace errors, and create an integrated developer experience.

领英推荐

Azure and FinOps Better Together ! p2

Victor Karabedyants 1 年前

.NET Core-based Microservices with Serverless…

Royal Cyber Asia 1 年前

From legacy monolith app to microservices…

Vitalii Diachenko 4 年前

Startups and companies continuously attempt to superglue in new systems to solve old problems. We all used to boast about the number of services we wrote and operated, only to realize that thousands of microservices means thousands of puzzle pieces used in different, disconnected ways.

This is a tradeoff we’ve made as a community to gain the flexibility and benefits of microservices.?

“A new architecture must maintain all the previous architecture’s benefits while reducing the complexity of gaining them”

Keep tools and programming languages usable

There are effective patterns that solve many of the problems in either architecture. You don’t usually find yourself asking how to call a profile API in a monolithic architecture. And you don’t usually ask who to talk to for a custom OS to run container code with microservices. But any transition in architectures makes them difficult to use, as the underlying capabilities that facilitate their ease are not necessarily there anymore.

I was once on a team where we couldn’t spin up a new test environment–let alone a local one–because it required pulling together hundreds of microservices, coordinating deployments, and dividing what configurations should be set without the benefit of knowing each system. But in a monolithic architecture, you’re an F5 press away.

“A problem that’s been solved in a previous architecture can be solved the same way in the new architecture”

Integrates with an ecosystem instead of trying to replace it

Large companies have teams that all use different tech stacks, whether it’s due to team knowledge and familiarity, or because it was the right choice for the problem at that time. Friction becomes the norm once there’s a need to centralize efforts to gain economies of scale.

On another one of my teams, it took us 4 months to validate that a best-in-class observability SaaS solution would work for a diverse tech organization, because it required retrofitting and redeploying each one of the hundreds of mission-critical services to get the real value. The high time-to-value meant we couldn’t replace it down the line either, placing us in a horrible negotiating position once the contract needed to be renewed.

“A new architecture must integrate into existing ecosystem tools with a significantly lower time-to-value”

Ensure user code is recognizable, debuggable, and patchable

Live systems are the lifeblood of a business. Their maintainability and reliability make the biggest difference between a sustainable and agile environment versus constantly being in a reactive on-fire mode with no ability to move forward.

Solutions today streamline or absorb the complexity into expert systems–ones that require significant training and understanding to operate or patch. Several of those companies and solutions are explicitly launching managed services due to the difficulty of operating their solutions… so much so that they’re de facto locked-in by sheer complexity.

A good way to determine operable solutions is something called the phone call razor.

Here’s how it works. Let’s say there’s a SEV0 outage on your service, and you’re not sure what’s happening. Is your solution simple enough that you can easily decide if the bug is in your code or the abstraction? Once you figure it out, is it simple enough for you to work around the bug until it’s fixed upstream??

If it isn’t simple enough, you’re betting your entire company on a phone call with one specific vendor. That’s quite an illusion of self-sustainability.

“The new architecture and abstraction must be simple enough to leave, operate, and modify, especially in live outage scenarios”

The Next Cloud Computing Architecture Is Here

Cloud computing has truly reached peak complexity, with the dominant available architectures shifting this complexity from one location to another instead of addressing it at its source.?

I believe the right solution requires a new architecture that follows key desirable design principles–ones that maintain benefits from previous architectures without requiring relearning tools and how to work.

In my next post, we’ll take a close look at how we built Klotho around these principles. I’ll be introducing the next cloud computing architecture that comes after monoliths, microservices, and serverless, a solution which fundamentally solves the complexity of cloud development without sacrificing everything we’ve grown to appreciate about it so far.

Want to make sure you know when it comes out? Follow me here on LinkedIn, Twitter and join our mailing list on our web site.

Cyrine Batshon

Attorney | Masters in International Law and Justice from Fordham University |Masters in Public International Law from Hebrew University of Jerusalem.

3 年

Highly recommended read!

1 次回应

Eli Rezik

VFX Artist & 3D Animator | Film Director (Greece, London, Palestine)

3 年

sounds interesting!

2 次回应

Jawad Azzam

R&D Team Lead @ AT&T

3 年

Good work, love it!!

1 次回应

Lucy Bartlett, PhD

Chief Performance Officer, Performance Marketing | PhD, Said Business School, University of Oxford

3 年

Ala, great article. Interesting read, good insight, and some transferable reflections.

1 次回应

Murali Krishna Hosabettu Kamalesha

Alexa+ Shopping, specializing in Gen AI, Consumer apps, Search and Growth.

3 年

Nice read!

1 次回应

查看更多评论

要查看或添加评论，请登录

Ala Shiban的更多文章

Platform Engineering Landmines - Part 1

2023年8月23日

Platform Engineering Landmines - Part 1

(I’d like to thank Yev Spektor, Brock Reiman and Claudio Masolo for contributing to the article discussion) As…

1 条评论
Adaptive Architectures

2022年11月29日

Adaptive Architectures

AWS re:Invent started today and we'll find out what the Amazonians have been cooking for us and which startups will be…

2 条评论
Specialized Clouds

2022年6月27日

Specialized Clouds

Happy Please Take My Children to Work Day! This is the 2nd edition of The Next Cloud Architecture newsletter, where we…

1 条评论
Serverless vs. Microservices: Two Sides of the Same Coin

2022年6月7日

Serverless vs. Microservices: Two Sides of the Same Coin

(cross-posted from the official klotho blog post. We're also hiring! DM me for more info) TLDR; If you really boil it…
When Amazon, Microsoft and Riot Games' Cultures Fuse

2020年4月30日

When Amazon, Microsoft and Riot Games' Cultures Fuse

This journey wouldn’t have been possible without these wonderful peers and managers: Michael Gesner, Tyson Trautmann…

1 条评论
7 Activities To Celebrate Teams in Tech & Beyond

2019年8月2日

7 Activities To Celebrate Teams in Tech & Beyond

We all strive to create an environment that makes employees love coming to work. I’ve found that engineering-heavy…

1 条评论
Ways That Will Help You Succeed In University

2015年5月28日

Ways That Will Help You Succeed In University

In principle, one purpose of universities is to rank students in order to find top researchers and help different…
4 Unofficial Things I Love Doing as a PM

2014年10月23日

4 Unofficial Things I Love Doing as a PM

(Read the original on Ala's Blog) Being a PM has been the most fun and differentiated experience so far in my…

10 条评论

See all articles

The Cloud Architecture of the Next 10 Years

Ala Shiban

Co-Founder & CTO at Klotho | Group Product Manager | Cloud & Dev Tools | ex-Microsoft | ex-Riot

“We succeeded in such incredible ways… and we shouldn’t have needed to do any of it.”

Streamlining Complexity

Guiding Principles

Maintain benefits from existing architectures

领英推荐

“A new architecture must maintain all the previous architecture’s benefits while reducing the complexity of gaining them”

Keep tools and programming languages usable

“A problem that’s been solved in a previous architecture can be solved the same way in the new architecture”

Integrates with an ecosystem instead of trying to replace it

“A new architecture must integrate into existing ecosystem tools with a significantly lower time-to-value”

Ensure user code is recognizable, debuggable, and patchable

“The new architecture and abstraction must be simple enough to leave, operate, and modify, especially in live outage scenarios”

The Next Cloud Computing Architecture Is Here

Ala Shiban的更多文章

社区洞察

其他会员也浏览了

Refactor vs. Lift-and-Shift vs. Containers

Microservices are usually the right choice, but not always!!

Serverless Architecture: The Next Big Thing in App Development

Leveraging Azure Function Apps for Scalable and Cost-Effective Serverless Computing

Serverless architecture

The Evolution of Serverless Architecture: Benefits and Use Cases in 2024

Microservices are great. But Don’t Use Microservices for the Heck of It!

Unlocking Developer Productivity with DevPod: The Ultimate Open-Source Dev Environment Solution

Best Practices When Using Terraform at Scale

The Power of Serverless Architecture in Modern App Development

“We succeeded in such incredible ways… and we shouldn’t have needed to do any of it.”

Streamlining Complexity

Guiding Principles

Maintain benefits from existing architectures

领英推荐

“A new architecture must maintain all the previous architecture’s benefits while reducing the complexity of gaining them”

Keep tools and programming languages usable

“A problem that’s been solved in a previous architecture can be solved the same way in the new architecture”

Integrates with an ecosystem instead of trying to replace it

“A new architecture must integrate into existing ecosystem tools with a significantly lower time-to-value”

Ensure user code is recognizable, debuggable, and patchable

“The new architecture and abstraction must be simple enough to leave, operate, and modify, especially in live outage scenarios”

The Next Cloud Computing Architecture Is Here

Ala Shiban的更多文章

Platform Engineering Landmines - Part 1

Adaptive Architectures

Specialized Clouds

Serverless vs. Microservices: Two Sides of the Same Coin

When Amazon, Microsoft and Riot Games' Cultures Fuse

7 Activities To Celebrate Teams in Tech & Beyond

Ways That Will Help You Succeed In University

4 Unofficial Things I Love Doing as a PM

社区洞察

其他会员也浏览了

Refactor vs. Lift-and-Shift vs. Containers

Microservices are usually the right choice, but not always!!

Serverless Architecture: The Next Big Thing in App Development

Leveraging Azure Function Apps for Scalable and Cost-Effective Serverless Computing

Serverless architecture

The Evolution of Serverless Architecture: Benefits and Use Cases in 2024

Microservices are great. But Don’t Use Microservices for the Heck of It!

Unlocking Developer Productivity with DevPod: The Ultimate Open-Source Dev Environment Solution

Best Practices When Using Terraform at Scale

The Power of Serverless Architecture in Modern App Development