Crossing the distributed systems chasm

Crossing the distributed systems chasm

A large part of my career has been helping an engineering organization evolve from a single monolithic system that brought the company success, to a system that can scale as the company hits their growth stage and starts taking off.

Often at this point, product and leadership are starting to feel hampered by a technology that is preventing them from quickly exploring new markets and new solutions. Engineering needs to scale out to multiple teams that can design, refactor, evolve, and deliver value independently.

Although there are exceptions, in the vast majority of cases a single monolithic server is not going to enable that.

But when you start breaking apart the monolith into independent services and components, you run into a whole new set of problems. You now have a distributed system. You have to deal with API contracts, partial failures, timeouts, data inconsistency, distributed logging and tracing, and so on. It's a pretty big leap.

Moving from building experiences to distributed systems

None of this is new information. What I wanted to talk about here is a pattern I have seen in terms of software developer background and experience.

The founding engineering team is incredibly valuable and worthy of respect. They care deeply about the product and customers, understand the business domain and are fully committed to the company.

This team is usually very adept at quickly building user experiences. Often they have come into software development from a different background, and besides some initial training have gotten most of their experience on the job, usually in smaller organizations.

Except for a few very deeply technical businesses, it's unusual for them to have a deep computer science background. They often haven't had to think about, and also often aren't particular excited about, topics like caching, transactional semantics, idempotence, resilience strategies, and so on.

But these things quickly become important when you move from one to two to five to ten services. What worked just fine in a single full-stack application can be very problematic in a distributed system.

For example, there was a team I worked with who had needed to get data from one service to another, so they set up some messaging infrastructure and start posting events. But the design was built around a happy path, they had never had to think about the questions of ordering or dropped messages, and they were running into big problematic outages and issues as a result. And the more the tried to solve this, the more they dug a hole for themselves. They just didn't have the background to know about all the theory and architectural patterns behind messaging - they didn't even think there was such as thing so never looked.

Another team tried to build a central service for identity management, but the API was based on their experience of directly updating tables, and they didn't see the danger of providing an API that allows callers to directly set flags and attibutes for users, rather than going through a domain abstraction. Many of these changes put the user into inconsistent states, or different clients interpreted the data differently, and it was a pretty big effort to patch and unwind the data and migrate the clients to a more coherent and consistent API.

A strategy to help with this change

What I have seen as a common pattern is that there are two "clusters" of developers from the original teams.

  • Disinterested - this group is not very interested in learning this new distributed systems stuff. They know what they know and prefer to keep working on things the way they are used to.
  • Curious - this group, usually smaller in my experience, finds all this new stuff fascinating and is hungry to learn.

To help the team transition to this brave new world, here is what I have seen work well.

  • First, you definitely need to bring in people who have experience building, delivering and maintaining production-grade commercial distributed applications. This particularly includes your product and engineering leadership.
  • Then those folks can work with your curious and hungry folks, helping them get comfortable and versant in this new world. They build early initial frameworks and systems, and evolve tools and practices.
  • Together, this core group starts establishing new ways of building and working. They create and share a collection of guidance and best practices. They create some kind of lightweight review process to help catch common mistakes and pitfalls. They pair with engineers to anchor these new ways in day-to-day work.

When you have strong intention, support from leadership, and a committed core group focusing on establishing and spreading these new guidelines and practices, then I have seen that over time the disinterested group comes to be comfortable and adept at working in this new world. They may never fully be into all the theory and philosophy behind it, but at least they know how to make it work.


NHM Tanveer Hossain Khan

Principal Software Engineer at Click Therapeutics

3 个月

Thanks for sharing, DVC. Good read. The disinterested group can't visualize the benefits and future trajectory. It's not always feasible to wait until the disinterested group adapts to the transition. We, as designers, may have to adjust to acknowledge this gap. Injecting the concept and foundation in a small, understandable chunk may require more work, but it's better than a failed transition.

Randy Shoup

SVP Engineering

3 个月

Very insightful, DVC! There is a big difference in attitude (and therefore aptitude) between the curious and the disinterested. I think we co-experienced your identity example …

要查看或添加评论,请登录

David Van Couvering的更多文章

  • Simplifying technical designs

    Simplifying technical designs

    Someone recently shared with me that they really appreciate my ability to take a massive, complex problem or design and…

    3 条评论
  • Choosing a backend language, choosing a culture

    Choosing a backend language, choosing a culture

    Somebody was talking to me about choosing a backend programming language for their startup. I was realizing that in…

    2 条评论
  • A set of coding standards

    A set of coding standards

    We have decided to focus on improving coding practices within my team, and I wanted to provide a digestible summary of…

    7 条评论
  • How big should a service be? The age-old problem

    How big should a service be? The age-old problem

    It happened again. I was in a conversation with a colleague, and they were trying to decide whether to make something a…

    8 条评论
  • Your job on ADD (AI-Driven-Development)

    Your job on ADD (AI-Driven-Development)

    In a recent article I mused about how AI will impact our jobs as software engineers. I was realizing things were…

    8 条评论
  • Deciding how frequently to deploy

    Deciding how frequently to deploy

    I was talking with a colleague last week about whether they should increase or decrease their deploy frequency. They…

    4 条评论
  • Turn out the lights when you leave...

    Turn out the lights when you leave...

    I have been having some interesting conversations with my developer colleagues as they are starting to see how well the…

    3 条评论
  • Politics and sales as a software engineer

    Politics and sales as a software engineer

    Politics and sales can definitely be a dirty business. Some people will say anything if it is to their advantage.

    1 条评论
  • Changing coding habits

    Changing coding habits

    Over the last few years, I have been working with teams trying to help them change their design and coding habits. I am…

    1 条评论
  • So busy but nothing gets done

    So busy but nothing gets done

    In my last post I talked about value streams and how we can use this concept to change how we think about building…

    2 条评论

社区洞察

其他会员也浏览了