Deciding how frequently to deploy

Deciding how frequently to deploy

I was talking with a colleague last week about whether they should increase or decrease their deploy frequency. They were worried that deploying more frequently could cause more bugs. They also were concerned because it takes time and effort to do a deploy.

There is a well-known optimization curve in queueing theory that talks about the optimal batch size. You need to balance the holding cost - the cost of holding off delivering a batch with the transaction cost - the cost of processing a batch.

A deployment is a batch of code you are shipping.

Our code does not take up space in a warehouse, but there are still a number of significant elements to our holding costs (this is also called the cost of delay):

  • The cost of uncaptured value - when software isn't shipped, it's not returning any value. The longer you wait, the more money you're losing by not getting the benefits. It's also possible that the longer you wait, the less the actual benefit will be because the market is shifting or moving.
  • The cost of delayed feedback - when you wait a long time to ship, you're waiting to get feedback. This almost always means that the cost of rework is higher once you realize that what you shipped isn't having the impact you expected it to have
  • The cost to quality - the larger the batch size, the harder it is to identify the root cause when a bug occurs. This can impact customers if you aren't able to fix the problem in production, but even if you can roll back quickly, it's a ton more work for your team to figure out what went wrong an why. I love what Adrian Cockroft from AWS once said at a QCon session I went to: "when your code has a lot of bugs in production, you should be deploying more frequently."
  • The impact to morale - the longer and slower it is to get code out, the less engaged your team is - they just don't get the sense that they are having an impact. Also, nobody enjoys manually shepherding out a deploy rather than slinging code and getting stuff done.

Our transaction cost is about how hard is it it deploy our software. Do we need manual approvals? Do we need to manually evaluate flaky test failures? Does it take an inordinate amount of time to run all our tests? Do we need to set up our environments?

If the cost of doing a deploy is high, you'll need a larger batch size - you'll be forced to have fewer deploys with more code in them.

This is why we need to be so careful about being too careful. Approval gates and slow or manual tests will force you to have larger batch sizes with all their resulting holding costs, including costs to quality.

This is why I always encourage teams to aggressively go after the things that are making their deploys expensive.

Build automated pipelines. Eliminate manual approvals. Reduce the number of flaky and slow tests using techniques such as testing in isolation, testing in parallel, having high priority tests vs additional tests you can run less frequently, and so on. Make your PRs smaller and easier to review and roll out.

Have this audacious goal: you check in some code, it goes to production, and it's a non-event.

Not only is coding more fun that way, but you'll reduce those holding costs and deliver significantly more value for your customers and your business.


David Van Couvering

Senior Principal Architect at eBay

4 个月

So I'd say: if you have a high transaction cost, then maybe slow down the deploys, but recognize the consequences: incurring higher holding costs. But the real thing to prioritize when an error budget is broken is to increase testing while maintaining or reducing transaction costs, rather than just slowing down

回复
Suneel Saguturu

Experienced Software Engineering Leader

4 个月

Thanks David Van Couvering for sharing your thoughts on this topic. I still remember we were discussing about deploying master/main branch code to production when we were working at Castlight Health .

Vengada Karthik Rangaraju

Experienced Backend & SRE engineer. Engineering Leader @ Twilio

4 个月

Nice article DVC! Something I read in Google SRE book is to decide frequency of deploy based on error budgets. If frequent deploys cause error rates to go up on a service, it could be one indicator to slow things down It’s also a great way to balance/push back on Product Managers who try to ship way too many features in a short span and focus on tech debt

回复

要查看或添加评论,请登录

David Van Couvering的更多文章

  • Simplifying technical designs

    Simplifying technical designs

    Someone recently shared with me that they really appreciate my ability to take a massive, complex problem or design and…

    3 条评论
  • Choosing a backend language, choosing a culture

    Choosing a backend language, choosing a culture

    Somebody was talking to me about choosing a backend programming language for their startup. I was realizing that in…

    2 条评论
  • A set of coding standards

    A set of coding standards

    We have decided to focus on improving coding practices within my team, and I wanted to provide a digestible summary of…

    7 条评论
  • How big should a service be? The age-old problem

    How big should a service be? The age-old problem

    It happened again. I was in a conversation with a colleague, and they were trying to decide whether to make something a…

    8 条评论
  • Crossing the distributed systems chasm

    Crossing the distributed systems chasm

    A large part of my career has been helping an engineering organization evolve from a single monolithic system that…

    3 条评论
  • Your job on ADD (AI-Driven-Development)

    Your job on ADD (AI-Driven-Development)

    In a recent article I mused about how AI will impact our jobs as software engineers. I was realizing things were…

    8 条评论
  • Turn out the lights when you leave...

    Turn out the lights when you leave...

    I have been having some interesting conversations with my developer colleagues as they are starting to see how well the…

    3 条评论
  • Politics and sales as a software engineer

    Politics and sales as a software engineer

    Politics and sales can definitely be a dirty business. Some people will say anything if it is to their advantage.

    1 条评论
  • Changing coding habits

    Changing coding habits

    Over the last few years, I have been working with teams trying to help them change their design and coding habits. I am…

    1 条评论
  • So busy but nothing gets done

    So busy but nothing gets done

    In my last post I talked about value streams and how we can use this concept to change how we think about building…

    2 条评论

社区洞察

其他会员也浏览了