Say goodbye to sprints. Kanban is the only way.
I'll state it boldly right up front. Scrum is an awful idea for SaaS engineering teams. I make no claims about whether Scrum has always been a bad idea or whether there are no good use cases for it. All I know is that the body of knowledge around efficiency of systems has increased dramatically in the past several decades and it's time software engineering teams start catching up. Specifically, we need to do away with sprint cycles and start developing and deploying continuously.
The Goal by Eliyahu Goldratt is a transcendent book given as assigned reading to MBA students across the country. It tells the story of a production facility whose existence predates the information era and, on the surface, seems to be inapplicable to modern SaaS companies. I'd like to make an argument that its teachings should be studied as a guide for how to build software products effectively today. You may have also read The Phoenix Project by Gene Kim, George Spafford, and Kevin Behr. While slightly less popularized (because of its niche specificity), this book builds on the principles taught in The Goal and applies them to the profession now known as DevOps. Each of these books also builds on principles of lean manufacturing, popularized by Toyota at least four decades ago. I'd like to take things a step further and apply the lessons from each of these sources to Software Engineering in the modern SaaS company.
Spoiler alert
If you haven't read these books, I'm going to spoil the big reveal of both of them right now. In each book the main character learns strategies to maximize throughput of their system. Most of these strategies help work around a bottleneck to improve efficiency. These strategies are applicable to various areas of the software engineering discipline and lead me to conclude with confidence that Scrum not the way. Kanban is the only way. The most applicable theme from both books, and Toyota's lean manufacturing methodology, is to reduce batch sizes.
Reduce batch sizes
Toyota's famous lean manufacturing process, as well as lessons from both the aforementioned books, argue that reducing batch sizes improves throughput of a given system. Think of a production facility with auto parts moving through various machines to be molded, painted, assembled, etc. The old way of thinking in this type of system is that each machine should only be operated when it's full. For example, the paint machine would only be run when enough parts are ready to fill an entire tray and therefore be "efficient." While there is a certain optimization made this way, it is not what the business actually needs to optimize for. Businesses want to make money. To do that they need to optimize for throughput. As Bill discovers in The Phoenix Project, the amount of work in process is the actual revenue killer in a production facility. It leads to slow cycle times, slow delivery times, and endless unplanned work.
It's much more ideal to have work flow through a system in a smooth, just-in-time rhythm. Reducing the batch sizes in turn reduces the system's cycle time (or the time it takes to get any one given thing through the system). By reducing cycle times we can clear most items from the production line much faster which yields many positive benefits including shorter time to collecting revenue, happier customers, increased iterations of learning/experience, and many more things. Furthermore, when mistakes happen in the process and something is ruined, the blast radius is much smaller because of reduced batch sizes. For instance, imagine that paint machine goes on the fritz and ruins a whole batch of auto parts. If we had waited to fill the whole tray before running the machine, many parts would be ruined creating a backlog of unplanned work. If we run only the parts through the machine that are ready when the machine becomes available, there's potentially a much smaller number of parts on the tray when disaster hits. This reduces the unplanned work generated from inevitable errors.
How does this apply to software engineering? Writing code is not as simple and defined as running widgets through an assembly line. But at some level, the principles still apply. Holding back all of our engineer's work for an arbitrary amount of time (say a 2-week sprint) may seem like it gives some sort of efficiency gains, but those are not the gains the business needs. I'll reiterate that if your business wants to make money it needs to optimize for throughput. Producing more software features (free of defects) is a surefire way to beat out competitors. It allows you to either move faster to market, or lower costs or possibly both. Just ask your CEO if she/he is interested in producing more, faster. So how do you optimize for throughput instead of whatever else you're doing? In two ways: work in small, incremental tasks and ship code in small, incremental deployments.
Small incremental tasks
Most software engineers already know they should reduce the scope of each task into small, accomplishable tasks that can be done within minutes, hours, or at most a day or two. Keeping your code commits as small and incremental as possible is much like pushing product through a station in a warehouse as soon as it's ready, rather than waiting for a whole tray to fill up. Reducing the scope of a task results in fewer potential bugs and human errors. It also means that small, meaningful improvements can get finished more quickly (due to reduced cycle time).
Small incremental deployments
Additionally, deploying every small, incremental change to production as soon as it's ready is another way of reducing batch sizes. It's another way of implementing Toyota's just-in-time methodology to increase throughput. When the QA team only needs to test one incremental change (instead of a whole sprint of changes), there is a lot less room for oversight. When you deploy one incremental code change to production, the likelihood of a merge conflict or configuration mistake is dramatically reduced. Your throughput will undoubtedly increase.
领英推荐
You may now recognize that the continuous method I'm describing does not mesh with the notion of time-bound sprints. Collecting and deploying completed work according to an arbitrary calendar is quite opposite to the idea of small batch sizes. In the manufacturing world, Toyota popularized kanban as the methodology for managing just-in-time production and reducing work in process. Bill in the Phoenix Project also introduced a kanban board to get control of the work being done in his IT organization. We can apply the same concept to software engineering. In this way you can move one task at a time through each phase of writing code, peer review, testing, and deployment to production without any handoffs (another way to lose throughput), without any context switching, and without any delays. This concept is not new. It's been known for many years. It's time software engineering teams get onboard.
As if this isn't enough evidence that sprints are the antithesis of throughput, let's discuss some additional points. Oftentimes scrum teams will do sprint planning processes to estimate the complexity of each upcoming task (or more frequently, how long they think it will take). Frequently, these tasks are assigned to individuals on the team before the sprint starts (see note at end of article). By assigning tasks up-front, you've made the mistake already of piling up a load of widgets in front of a particular machine. Each task to be done should instead be put in a shared list in prioritized order so that as soon as an engineer is available she/he can take the first item available that she/he has the ability to complete and start working. This again improves throughput by preventing bottlenecks. This also happens to be what kanban is all about. Create a simple, prioritized task list and let the next available engineer pick up what's on top.
Finally, there's an inevitable psychological problem inherent with working in sprints. When the amount of work for a time period is predetermined, engineers have a tendency to only accomplish that amount of work and no more. To avoid the embarrassment of not completing all the tasks in a sprint, engineers will also often over-estimate the complexity or duration of a task to ensure they get it done on time. Both of these phenomena create a dramatic, yet difficult to measure, reduction in throughput for your business.
It's time to change. What are you worried about?
You may still be skeptical that kanban is the only way. You probably work in an organization that has a history of deploying a ton of bugs and the stability of the whole platform is questionable. You're not alone. As counterintuitive as it might seem to you (at first), by allowing your teams to take small incremental changes from inception to production as frequently as they can will certainly reduce your bug count. The QA team will more easily identify bugs because of the reduced scope of each task to be tested. Engineers won't have to switch context to come back to something that they wrote last sprint which is only now being tested. Deployments will have fewer configuration changes (or none at all) in each release, improving the likelihood that everything runs smoothly. When a defect does make it through the system, it will be easy to identify the small incremental change that broke things so the offending engineer can turn around a fix. That fix will also go through the development process and get shipped to production quickly, reducing the bug count even further.
Maybe you're thinking that your industry is unique or your Enterprise customers wouldn't allow continuous deployment. Let me ease your mind by pointing out that the most giant and influential software companies on the planet are doing continuous deployment practices. Maybe you have some internal processes around client communication or release notes that seem incompatible with this new methodology. If you absolutely must release new features according to a schedule, your engineering team can still work in a kanban flow and deploy continuously to a production-ready staging environment completely decoupled from the production environment. Though less ideal, you can have releases according to whatever calendar you need without crippling your throughput.
Teams I've been a part of who follow this methodology frequently reach bug zero and remain close to it permanently. Engineers are happier. Customers are happier. The world makes sense again.
Give it a try and you won't be disappointed. If you have questions or are struggling to make the transition send me a message–I'd love to talk about your situation.
Note on "true scrum methodology": I realize that "true scrum methodology" doesn't involve engineers estimating tasks in the amount of time it will take to accomplish and it also doesn't attempt to assign specific tasks to specific engineers before the sprint starts, but you and I both know that no one adheres to the pure methodology. And if they did, almost every other point made in this article still applies.
Staff Software Engineer
9 个月It's not the only way, but it's the best way invented. The general idea of favoring synchronous development (Scrum) over asynchronous process (Kanban) development seems absurd to me. A system in which a feature completed and tested in 15 days is considered a "failure" by the Product Manager in a 14-day sprint is flawed in itself. Plus, it's worth mentioning the constant pressure, stress and turnover in Scrum teams. If you are an engineer and have a different experience - please share.
The issue I had with scrum and you will see this with a lot of frameworks is the work will get dictated by the process and not actually what needs to be done. If you have a feature that will need to be longer in size than your current sprint cycle, most of the time that feature will not get done for the sake of accomplishing your sprint. In the end the sprint was a success but the overall impact might not have been the best.
?? Experto en Analítica y Robótica para Auditoría Interna en Bancaribe | ACL Analytics y ACL Robotics | Apasionado de Agile y sus diferentes marcos o prácticas (Scrum, Kanban, OKR, M3.0...)
2 年SCRUM NO ES UNA METODOLOGíA. GRAVE ERROR...
Working with teams to help them work better.
2 年To me this is a bit of a false dichotomy. While it was ambiguous in older versions of the Scrum Guide, since 2020 it's been explicitly stated that you can release multiple increments within a Sprint, and that the Sprint isn't a "batch." So what is the main difference between Scrum and the Kanban Method? That might come down to the Sprint Goal. If you use it well, the Sprint Goal is a cross-check on the strategic direction between the team, customers, users and stakeholders. It acts as a focal point that helps the team to inspect and adapt product backlog items with the customer as they work. I that sense Id suggest it's about making sure that the team does not become a delivery-oriented silo, but is integral to the strategic success of the business. Does that add value in your context? What could you measure to find out?
Senior Solutions Engineer at Reputation
2 年Great article and great work as always by your team!