登录查看更多内容

Crossing the distributed systems chasm

David Van Couvering

Senior Principal Architect at eBay

发布日期: 2024年12月18日

A large part of my career has been helping an engineering organization evolve from a single monolithic system that brought the company success, to a system that can scale as the company hits their growth stage and starts taking off.

Often at this point, product and leadership are starting to feel hampered by a technology that is preventing them from quickly exploring new markets and new solutions. Engineering needs to scale out to multiple teams that can design, refactor, evolve, and deliver value independently.

Although there are exceptions, in the vast majority of cases a single monolithic server is not going to enable that.

But when you start breaking apart the monolith into independent services and components, you run into a whole new set of problems. You now have a distributed system. You have to deal with API contracts, partial failures, timeouts, data inconsistency, distributed logging and tracing, and so on. It's a pretty big leap.

Moving from building experiences to distributed systems

None of this is new information. What I wanted to talk about here is a pattern I have seen in terms of software developer background and experience.

The founding engineering team is incredibly valuable and worthy of respect. They care deeply about the product and customers, understand the business domain and are fully committed to the company.

This team is usually very adept at quickly building user experiences. Often they have come into software development from a different background, and besides some initial training have gotten most of their experience on the job, usually in smaller organizations.

Except for a few very deeply technical businesses, it's unusual for them to have a deep computer science background. They often haven't had to think about, and also often aren't particular excited about, topics like caching, transactional semantics, idempotence, resilience strategies, and so on.

But these things quickly become important when you move from one to two to five to ten services. What worked just fine in a single full-stack application can be very problematic in a distributed system.

领英推荐

? 100k Docker and 44k Kubernetes deploys, Building…

Learnk8s 5 个月前

? Decoding CPU utilization, Observability at the edge,…

Learnk8s 7 个月前

? Fairness aware load distribution, Kubernetes…

Learnk8s 6 个月前

For example, there was a team I worked with who had needed to get data from one service to another, so they set up some messaging infrastructure and start posting events. But the design was built around a happy path, they had never had to think about the questions of ordering or dropped messages, and they were running into big problematic outages and issues as a result. And the more the tried to solve this, the more they dug a hole for themselves. They just didn't have the background to know about all the theory and architectural patterns behind messaging - they didn't even think there was such as thing so never looked.

Another team tried to build a central service for identity management, but the API was based on their experience of directly updating tables, and they didn't see the danger of providing an API that allows callers to directly set flags and attibutes for users, rather than going through a domain abstraction. Many of these changes put the user into inconsistent states, or different clients interpreted the data differently, and it was a pretty big effort to patch and unwind the data and migrate the clients to a more coherent and consistent API.

A strategy to help with this change

What I have seen as a common pattern is that there are two "clusters" of developers from the original teams.

Disinterested - this group is not very interested in learning this new distributed systems stuff. They know what they know and prefer to keep working on things the way they are used to.
Curious - this group, usually smaller in my experience, finds all this new stuff fascinating and is hungry to learn.

To help the team transition to this brave new world, here is what I have seen work well.

First, you definitely need to bring in people who have experience building, delivering and maintaining production-grade commercial distributed applications. This particularly includes your product and engineering leadership.
Then those folks can work with your curious and hungry folks, helping them get comfortable and versant in this new world. They build early initial frameworks and systems, and evolve tools and practices.
Together, this core group starts establishing new ways of building and working. They create and share a collection of guidance and best practices. They create some kind of lightweight review process to help catch common mistakes and pitfalls. They pair with engineers to anchor these new ways in day-to-day work.

When you have strong intention, support from leadership, and a committed core group focusing on establishing and spreading these new guidelines and practices, then I have seen that over time the disinterested group comes to be comfortable and adept at working in this new world. They may never fully be into all the theory and philosophy behind it, but at least they know how to make it work.

NHM Tanveer Hossain Khan

Principal Software Engineer at Click Therapeutics

3 个月

Thanks for sharing, DVC. Good read. The disinterested group can't visualize the benefits and future trajectory. It's not always feasible to wait until the disinterested group adapts to the transition. We, as designers, may have to adjust to acknowledge this gap. Injecting the concept and foundation in a small, understandable chunk may require more work, but it's better than a failed transition.

2 次回应

Randy Shoup

SVP Engineering

3 个月

Very insightful, DVC! There is a big difference in attitude (and therefore aptitude) between the curious and the disinterested. I think we co-experienced your identity example …

1 次回应

查看更多评论

要查看或添加评论，请登录

David Van Couvering的更多文章

Simplifying technical designs

2025年3月10日

Simplifying technical designs

Someone recently shared with me that they really appreciate my ability to take a massive, complex problem or design and…

3 条评论
Choosing a backend language, choosing a culture

2025年1月27日

Choosing a backend language, choosing a culture

Somebody was talking to me about choosing a backend programming language for their startup. I was realizing that in…

2 条评论
A set of coding standards

2025年1月11日

A set of coding standards

We have decided to focus on improving coding practices within my team, and I wanted to provide a digestible summary of…

7 条评论
How big should a service be? The age-old problem

2025年1月4日

How big should a service be? The age-old problem

It happened again. I was in a conversation with a colleague, and they were trying to decide whether to make something a…

8 条评论
Your job on ADD (AI-Driven-Development)

2024年11月13日

Your job on ADD (AI-Driven-Development)

In a recent article I mused about how AI will impact our jobs as software engineers. I was realizing things were…

8 条评论
Deciding how frequently to deploy

2024年10月31日

Deciding how frequently to deploy

I was talking with a colleague last week about whether they should increase or decrease their deploy frequency. They…

4 条评论
Turn out the lights when you leave...

2024年10月6日

Turn out the lights when you leave...

I have been having some interesting conversations with my developer colleagues as they are starting to see how well the…

3 条评论
Politics and sales as a software engineer

2024年10月1日

Politics and sales as a software engineer

Politics and sales can definitely be a dirty business. Some people will say anything if it is to their advantage.

1 条评论
Changing coding habits

2024年9月18日

Changing coding habits

Over the last few years, I have been working with teams trying to help them change their design and coding habits. I am…

1 条评论
So busy but nothing gets done

2023年2月3日

So busy but nothing gets done

In my last post I talked about value streams and how we can use this concept to change how we think about building…

2 条评论

See all articles

Crossing the distributed systems chasm

David Van Couvering

Senior Principal Architect at eBay

Moving from building experiences to distributed systems

领英推荐

A strategy to help with this change

David Van Couvering的更多文章

社区洞察

其他会员也浏览了

? Managing 100s of Kubernetes clusters using Cluster API, When Kubernetes and Go don't work well?together, Kubernetes probes done wrong

Distributed Computing, Microservices, and Heterogeneous Architecture: The Cornerstone of High-Performance Systems for the AI-Driven Era

Latency and Architectural Decisions in Global Distributed Systems

Building Better Distributed Systems: From Evolution to Best Practices

Distributed Tracing: Unraveling Complexities in Modern Software Architectures

5 Must-Know Distributed Systems Design Patterns for Event-Driven Architectures

Understanding Chaos Engineering

Top 5 Open source monitoring tools for Kubernetes

RAFT Algorithm: Consensus in Distributed Systems

Top 5 Open source monitoring tools for Kubernetes

Moving from building experiences to distributed systems

领英推荐

A strategy to help with this change

David Van Couvering的更多文章

Simplifying technical designs

Choosing a backend language, choosing a culture

A set of coding standards

How big should a service be? The age-old problem

Your job on ADD (AI-Driven-Development)

Deciding how frequently to deploy

Turn out the lights when you leave...

Politics and sales as a software engineer

Changing coding habits

So busy but nothing gets done

社区洞察

其他会员也浏览了

? Managing 100s of Kubernetes clusters using Cluster API, When Kubernetes and Go don't work well?together, Kubernetes probes done wrong

Distributed Computing, Microservices, and Heterogeneous Architecture: The Cornerstone of High-Performance Systems for the AI-Driven Era

Latency and Architectural Decisions in Global Distributed Systems

Building Better Distributed Systems: From Evolution to Best Practices

Distributed Tracing: Unraveling Complexities in Modern Software Architectures

5 Must-Know Distributed Systems Design Patterns for Event-Driven Architectures

Understanding Chaos Engineering

Top 5 Open source monitoring tools for Kubernetes

RAFT Algorithm: Consensus in Distributed Systems

Top 5 Open source monitoring tools for Kubernetes