How to get rid of walls in your deployment process

How to get rid of walls in your deployment process

A primary objective of the cross-functional team and the "you build it, you run it" #DevOps mindset is to enable teams to deploy and release new features to customers quickly.

Unfortunately, due to past siloed teams and shared monoliths, this is frequently not the case for many organizations.

While many teams nowadays are cross-functional, not all functions are fully integrated into this new paradigm. In my experience with several companies, teams such as QA and infrastructure often operate in a distinct manner.

There are reasons for this, but depending on who initiates (or executes) the deployments, these teams are at risk of becoming bottlenecks in the deployment process. Walls in the deployment process and slow, expensive and frustrating handovers are the natural consequence of this.


Remark: This article is an updated version of the article "Do you have walls in your deployment process?", first published on the #UnblockedEngineering blog in November 2021.


Case Study: Deployments at BRYTER 

At BRYTER, we had shared deployment artefacts. Multiple teams contributed to these. Our QA team of three people, at that time, was kicking off the deployment process and was executing various manual and semi-automatic steps. Most of these were widely unknown to the developers.

All, we developers needed to care about, was that deployments happen on Tuesday and Thursday, and we need to get our code merged by that time if we want it to be part of the deployment. Sounds paradisal for developers? Somehow yes. It is always convenient if you do not have to deal with the consequences of your actions.

Of course, we were responsible for shipping working software and for solving production incidents fast. However, there was the safety net in the form of the QA team, who would test our code and would often find something that was blocking deployments. This not only meant a delay of the deployment, but also a lot of wasted effort by the QA team, which was now waiting for a fix from development, after which they would restart the deployment process.

Not rarely, the entire process would take a full day, blocking at least two people of a three-people-team. Imagine, what doing this twice a week meant for their productivity. And of course, they were not happy with this situation. In fact, developers were not happy with the situation as well because they could not understand why deployments took so long and were not done more often.

Thus, as a conclusion of this, everybody was unhappy and the only people, knowing about the deployment process, were in the QA team which was busy doing deployments. — They had no capacity for actually working on the process. Furthermore, a uni-functional team also isn’t well equipped for working on a CI/CD pipeline, as this usually involves a lot of knowledge of other disciplines such as infrastructure, architecture and even interconnectedness of systems.

Thinking about this systemic issue in the Developer Experience Team, we decided, to completely throw-over the process and fundamentally change how we deploy at BRYTER.

The idea: Let developers do deployments 

In order for developers to really own their product, they also need to own the deployments and the effects they have on the system. The good as well as the bad ones. Furthermore, we thought that developer involvement in deployments will spark countless interesting questions, ideas, and learnings around the process that will finally lead to an improvement of the process itself.

Therefore, we decided that it would be much better, to have developers be in charge of deployments. We, thankfully and unsurprising, had full support of the QA team. — They also disliked the fact, that they ended up in charge of the deployments. Furthermore, when I pitched the idea to the technical leadership community in the company, the feedback was overwhelmingly positive.

Thus, we started to flesh out the idea to pitch it to all developers.

How we started this change 

Shifting the responsibility for the deployments from the QA team to the development teams cannot be done just like that. The QA team has built up a huge amount of knowledge around the deployment process and its risks and pitfalls, whereas most developers had almost none. Thus, just shifting the responsibility would be a dangerous thing to do.

Consequently, the QA team started to document the process in depth and were adding further information while performing deployments.

Furthermore, we decided that every deployment, done by a development team, would be shadowed by a QA person, who will not do the steps but will give developers the safety that they need to do this operation, which was highly unusual for them. This is especially important because developers perceive deployments as a risky, even if there is a solid set of tests. Things can go wrong, even good automated tests in place. Most developers, at that time, felt not equipped to deal with unplanned events, given that not only their team, but many others were contributing changes to a deployment.

Because we had multiple teams contributing to a shared deployment artefact, we decided to rotate the deployment responsibility. Thus, in every week, another team would do the deployments.

The goal was, that every team got a rough understanding of the deployment process. But even more important, every team had the opportunity to ask questions about the process and encounter the challenges first hand.

In each week, we would do a retrospective with the development team, the QA team and a member of the platform team to learn from the experiences, while they were still fresh. Furthermore, we introduced a board to track improvement ideas and the progress of implementing these.

What we learned from the change 

We learned. A lot. And we improved. A lot.

First, developers were shocked, how complicated and tedious the deployment process was. This pain produced two things:

  1. Empathy for the QA team (“We can’t believe you had to do this at least twice a week. Every week. Sorry for that!”)
  2. Interest in the deployment process and, with this, many questions and ideas. (“Why are we testing this in that way?”, “Why is this not automated?”, “Why are we doing this at all?”, …)

Now that developers were experiencing the pain of doing deployments first hand, we were highly motivated to improve the situation. Moreover, this cross-functional approach of doing deployments helped to surface waste in the process.

For example, was the QA team testing certain things manually during deployments. With the developers on board, we decided to drop these steps entirely because we already had automated tests for these areas since a long time. The QA team just was not aware of them.

Furthermore, we stripped some other tasks, such as creating a change log, from the deployment process and automated others like smoke-testing the application after deployments on staging and in production.

Moreover, developers learned about the Selenium end-to-end tests and that these are sometimes flaky because of careless changes to the frontend or because of missing communication between developers and QA people inside of the teams.

This is a situation we were improving afterwards by bringing the responsibility for the Selenium tests into the cross-functional teams and also by automating them completely. Up to that date, they were not part of the automated continuous integration pipeline but were executed manually directly before a production deployment.

Side-note: The Developer Experience Team addresses the issue around flaky Selenium and a lack of ownership of those tests by respective development teams later on by introducing Playwright which felt more natural to use for developers who were used to Typescript code. Furthermore, Playwright proved too be way more stable than Selenium.

Last but not least, developers were experiencing the difficulty of coupling deployments and releases and not properly separating these two actions by using release toggles. This is one of the benefits of having this direct feedback loop that comes from the involvement of developers in the deployment process.

How we continued

Of course, this was not the end of the improvement process. Having varying groups of people doing the deployments could sound as if it made things worse. And to some extent, it did. It definitely isn’t something, that made us faster immediately. But it puts us into a situation from which improvements can be made more easily because interest in these improvements was much higher and more people were able to contribute ideas on how to improve the process.

Afterwards, we were working on making the automated end-to-end tests stable, fast and part of the CI/CD pipeline. With this in place, developers will get direct end-to-end test feedback after (or even before) merging.

The deployment process then was not much more than clicking a button. Once this is worked sufficiently well, we removed that button and fully automated the deployment every time build and test stages were green. Thus, developers did not need to be on a “deployment rotation” any longer and deployments happened fully automatic. Read more about this here.

Key Takeaways for your Context 

Where and how to start this change obviously depends on the company, the culture and where you are currently standing. If you see a problem with your deployment process (or any other matter), it helps to:

  1. Write it down. This will help you to think it through more in depth. Furthermore, it will help you to find some gaps and rough edges, that you can refine.
  2. Discuss it with others. Ask the right people if they can relate to this problem and what their ideas are. Before you initiate a bigger change, it is always good to know that you are not alone with the perception, that there is a problem.
  3. Refine your approach. Use the feedback that you get from your peers to fill the gaps and make your idea more round.
  4. Pitch it. With the support of the right people for your endeavour, pitch it to the other people who will be affected or who need to cooperate to make the change a success.
  5. Implement it. Put it into action and help others to contribute their part to the result.
  6. Give updates. Keep your peers up to date on how it is going, which improvements you are seeing and also, where you discover problems that need to be addressed. This makes sure that not only you can see the progress and value, but also the folks that might not be as connected to your mission see, why the change is valuable.

Conclusion 

Teams that develop parts of the product should have a stake in the deployment process and should be able to deploy their changes frequently and without external help.

By shifting the responsibility onto the teams, they had to learn about the status-quo and were able to contribute to a better solution.

While this shift was not without friction nor resistance, it helped us to move closer to continuous deployment and sparked many discussions and initiatives around improving the current process.

How are you doing #deployments? Who is responsible for the deployment process? Let me know in the comments! Sharing highly appreciated!

要查看或添加评论,请登录

Tobias Mende的更多文章

社区洞察

其他会员也浏览了