The myth of full utilization - Part 1: The downsides of parallelizing software development
Manuel Drews
Director Of Engineering | Instruments & FX + Developer Platform at Native Instruments
It sounds like a good idea: "We try to work in parallel as much as possible to be as efficient as possible. We want to spend our time working, not discussing." I heard this statement from the lead developer of a project I was consulting for a while ago. And at first glance that seems to make a lot of sense, especially to an engineer: most modern software uses multiple threads that run on multiple processor cores to parallelize work and often derives a significant performance boost from that. If parallelism speeds up the software itself, it should be able to speed up its development as well, right? It also should keep every engineer constantly busy, so every resource is fully utilized and the company gets the maximum value for the salaries it pays.
Unfortunately this idea often does not work out well in reality, as it is based on incorrect assumptions and ignores some important aspects.
Let's start to look at it from an engineering perspective: An important rule to consider when thinking about parallelizing work is Amdahl's law (1). It originates from computer science but is applicable to any situation where work is split up in tasks that are to be executed in parallel. Amdahl's law states that the maximum achievable speedup is limited by the amount of work that is interdependent and thus needs to be carried out sequentially. For example, even if 75% of all tasks can be done in parallel, the maximum speedup factor is 4, no matter how many resources are thrown at the problem.
Even if 75% of all tasks can be done in parallel, the maximum achievable speedup is 4x
Factor 4 is a great improvement, if we could transfer that to software development that would be awesome. So what's wrong with it?
First of all, this is a theoretical limit that according to Amdahl's formula requires at least 64 people working on the parallel tasks (and which you'd have to manage without additional overhead). With a more realistic team size of 4, the achievable speedup is only about 2.3. And remember, this is for a project where 75% of the work can be parallelelized. Unfortunately this is hardly ever the case in software development, however much one might wish it be so. The engineers are working on the same codebase and their code is supposed to contribute to the same product, so there has to be a certain amount of synchronization. I think assuming 50% of the work being really independent is still a quite generous assumption for most projects (*). The achievable speedup with a 4 person team then is only 1.6. Ok, that's still much better than nothing so let's go with it, right?
Not quite. The second faulty assumption is that every task can be executed equally well by any resource. This assumption might be true for classical multiprocessor architectures where indeed all the CPU cores are identical. It doesn't hold up for humans, and even less so if the job requires expertise and information like in software development. The more the engineers work in isolation from each other, the longer the ramp-up time will be if one has to jump on a different area of the codebase. This introduces a significant risk of resource bottlenecks where several tasks are waiting to be taken on by the same person. So, ironically, parallelizing too much can actually force you to serialize work that could be tackled at the same time. This is exactly what happened in the software project I mentioned in the beginning: towards the end, 16 of the remaining 20 tickets in the backlog could only reasonably be worked on by one single engineer.
Parallelizing too much can actually force you to serialize work that could be tackled at the same time.
A common side-effect of this situation is that developers who can't work on the most important tasks start to pick up other, less important tickets. As a consequence, you end up with a lot of loose ends and unfinished features that clutter the codebase, making it harder (=slower) to work with. Managing all those open tasks increases the organizational overhead as well.
Lastly, imagine that the engineer responsible for those 16 tasks suddenly becomes unavailable due to sickness or other circumstances. This so-called 'bus-factor' (3) (or 'lottery' factor for the more optimistic among you) usually is quite unfavorable in a team where a lot of these information islands exist. Any person that suddenly becomes unavailable puts the whole project timeline at risk.
Speaking of risk: trying to work in parallel for as long as possible can have other severe downsides in terms of risk-management. In this work mode, feature integration and testing tend to be delayed significantly in an attempt to avoid the synchronization overhead that comes with it. But the later in time integration happens, the later problems are identified and the less time remains to fix them. Software modules that work well in isolation might display unexpected side-effects when connected. You might find some fundamental conceptual issues that nobody thought of before. Some usability problems only become visible when people are actually able to try out the application. Whatever it is, fixing these issues will be the harder the later in the game you are. For some it might not be possible anymore at all, or only in ways that have severe negative impacts on the code quality and future maintainability.
The later in time integration happens, the later problems are identified and the less time remains to fix them.
If you push integration to the very end of the project, you also loose the flexibility to modify or reduce the feature set in order to make a release-date or react to changed requirements. And what about your marketing department, or the people translating your software and writing documentation for it? They need finished features to work with too.
The bottom line here is that the speedup that is achievable by parallelizing is often not that great, and that it significantly increases risk and uncertainty if you don't keep an eye on the side-effects.
But that's still not all, there's yet another aspect: not only does over-parallelizing not really speed up your project, it also reduces it's quality. Every developer has a skewed perspective on their own work and it's all too easy to over-engineer, to forget about corner cases or simply miss a bug. If the engineers don't talk to each other it's unrealistic that they always find the best solution to a problem. If they try to save time and effort by not doing code-reviews it's almost certain that problematic code goes to production unnoticed. You'll also very likely end-up with a codebase in which problems are solved multiple times in different ways with all the negative consequences that has.
So, now I've talked a lot about the negative aspects of parallelizing work. Criticizing without offering alternatives is terribly unconstructive though. Therefore in my follow-up article I will address ways to improve software development throughput without falling into these traps.
Thanks for reading. As always, I'm interested in feedback and your experiences with the topic.
TL;DR:
Working in parallel can speed up software development to a certain degree. Parallelizing too much however will result in resource bottlenecks and increased risk for the whole project, as well as reduced software quality. So, apply it with care or the drawbacks will rapidly outweigh the benefits.
* 50% is a ballpark estimation that reflects my personal experience. It varies with the complexity of the application and the quality of the codebase, especially its modularity. As my former colleague Tom Smith pointed out, the number also greatly depends on the domain (where e.g. web development tends to be better parallelizable than work on a complex desktop application). The takeaway here is that you have to look closely at your project at hand and assess the respective trade-offs carefully.
References:
- https://en.wikipedia.org/wiki/Amdahl%27s_law
- https://en.wikipedia.org/wiki/Bus_factor
Acknowledgements
Many thanks to my colleagues and friends who have reviewed this article and shared their feedback and insights: Anna Gough, Alex Pukinskis and Tom Smith