Deciding how frequently to deploy
I was talking with a colleague last week about whether they should increase or decrease their deploy frequency. They were worried that deploying more frequently could cause more bugs. They also were concerned because it takes time and effort to do a deploy.
There is a well-known optimization curve in queueing theory that talks about the optimal batch size. You need to balance the holding cost - the cost of holding off delivering a batch with the transaction cost - the cost of processing a batch.
A deployment is a batch of code you are shipping.
Our code does not take up space in a warehouse, but there are still a number of significant elements to our holding costs (this is also called the cost of delay):
Our transaction cost is about how hard is it it deploy our software. Do we need manual approvals? Do we need to manually evaluate flaky test failures? Does it take an inordinate amount of time to run all our tests? Do we need to set up our environments?
领英推荐
If the cost of doing a deploy is high, you'll need a larger batch size - you'll be forced to have fewer deploys with more code in them.
This is why we need to be so careful about being too careful. Approval gates and slow or manual tests will force you to have larger batch sizes with all their resulting holding costs, including costs to quality.
This is why I always encourage teams to aggressively go after the things that are making their deploys expensive.
Build automated pipelines. Eliminate manual approvals. Reduce the number of flaky and slow tests using techniques such as testing in isolation, testing in parallel, having high priority tests vs additional tests you can run less frequently, and so on. Make your PRs smaller and easier to review and roll out.
Have this audacious goal: you check in some code, it goes to production, and it's a non-event.
Not only is coding more fun that way, but you'll reduce those holding costs and deliver significantly more value for your customers and your business.
Senior Principal Architect at eBay
4 个月So I'd say: if you have a high transaction cost, then maybe slow down the deploys, but recognize the consequences: incurring higher holding costs. But the real thing to prioritize when an error budget is broken is to increase testing while maintaining or reducing transaction costs, rather than just slowing down
Experienced Software Engineering Leader
4 个月Thanks David Van Couvering for sharing your thoughts on this topic. I still remember we were discussing about deploying master/main branch code to production when we were working at Castlight Health .
Experienced Backend & SRE engineer. Engineering Leader @ Twilio
4 个月Nice article DVC! Something I read in Google SRE book is to decide frequency of deploy based on error budgets. If frequent deploys cause error rates to go up on a service, it could be one indicator to slow things down It’s also a great way to balance/push back on Product Managers who try to ship way too many features in a short span and focus on tech debt