Crossing the distributed systems chasm
A large part of my career has been helping an engineering organization evolve from a single monolithic system that brought the company success, to a system that can scale as the company hits their growth stage and starts taking off.
Often at this point, product and leadership are starting to feel hampered by a technology that is preventing them from quickly exploring new markets and new solutions. Engineering needs to scale out to multiple teams that can design, refactor, evolve, and deliver value independently.
Although there are exceptions, in the vast majority of cases a single monolithic server is not going to enable that.
But when you start breaking apart the monolith into independent services and components, you run into a whole new set of problems. You now have a distributed system. You have to deal with API contracts, partial failures, timeouts, data inconsistency, distributed logging and tracing, and so on. It's a pretty big leap.
Moving from building experiences to distributed systems
None of this is new information. What I wanted to talk about here is a pattern I have seen in terms of software developer background and experience.
The founding engineering team is incredibly valuable and worthy of respect. They care deeply about the product and customers, understand the business domain and are fully committed to the company.
This team is usually very adept at quickly building user experiences. Often they have come into software development from a different background, and besides some initial training have gotten most of their experience on the job, usually in smaller organizations.
Except for a few very deeply technical businesses, it's unusual for them to have a deep computer science background. They often haven't had to think about, and also often aren't particular excited about, topics like caching, transactional semantics, idempotence, resilience strategies, and so on.
But these things quickly become important when you move from one to two to five to ten services. What worked just fine in a single full-stack application can be very problematic in a distributed system.
领英推荐
For example, there was a team I worked with who had needed to get data from one service to another, so they set up some messaging infrastructure and start posting events. But the design was built around a happy path, they had never had to think about the questions of ordering or dropped messages, and they were running into big problematic outages and issues as a result. And the more the tried to solve this, the more they dug a hole for themselves. They just didn't have the background to know about all the theory and architectural patterns behind messaging - they didn't even think there was such as thing so never looked.
Another team tried to build a central service for identity management, but the API was based on their experience of directly updating tables, and they didn't see the danger of providing an API that allows callers to directly set flags and attibutes for users, rather than going through a domain abstraction. Many of these changes put the user into inconsistent states, or different clients interpreted the data differently, and it was a pretty big effort to patch and unwind the data and migrate the clients to a more coherent and consistent API.
A strategy to help with this change
What I have seen as a common pattern is that there are two "clusters" of developers from the original teams.
To help the team transition to this brave new world, here is what I have seen work well.
When you have strong intention, support from leadership, and a committed core group focusing on establishing and spreading these new guidelines and practices, then I have seen that over time the disinterested group comes to be comfortable and adept at working in this new world. They may never fully be into all the theory and philosophy behind it, but at least they know how to make it work.
Principal Software Engineer at Click Therapeutics
3 个月Thanks for sharing, DVC. Good read. The disinterested group can't visualize the benefits and future trajectory. It's not always feasible to wait until the disinterested group adapts to the transition. We, as designers, may have to adjust to acknowledge this gap. Injecting the concept and foundation in a small, understandable chunk may require more work, but it's better than a failed transition.
SVP Engineering
3 个月Very insightful, DVC! There is a big difference in attitude (and therefore aptitude) between the curious and the disinterested. I think we co-experienced your identity example …