CD - It's not all about tech!
Sylvia MacDonald
Helping teams align and deliver high quality software | Agile and Quality Coach | Budding Gardener
As are many organisations, we were on a journey to continuous delivery. We were deploying to production once at week, with lots of manual intervention. We wanted to deploy at least daily.
There was the obvious work needed to build the automated testing and deployment pipelines. However, there were other factors contributing to why our releases were stressful and bumpy. We just weren’t sure exactly what those problems were.
I started observing our release process and mapping what I saw. Objectively observing all stages of our releases in action allowed me to measure our release process. For example, the time it took to complete each step of the process whether manual or automated, the number of people involved from different roles, and at what point we would discover an issue that blocked the release. Once we had enough data, we started analysing it and looked for the bottlenecks – these became our most important problems to solve.
It’s important to be objective when observing; to not let your own opinions and biases cloud the facts of what’s actually happening, or not happening! Objective observation is skill well worth practising – it’s the basis of Gemba and experimentation. It’s very valuable when trying to improve your processes and systems.
Why focus on improving our release process?
It was simple really. Our releases were not as smooth, frictionless, or as frequent as we wanted them to be. That impacted our ability to deliver.
Although my role at the time was Exploratory Tester in a cross-functional team, I have always been interested in improving the systems we work in. It was a natural step to get involved in improving our release process.
The good thing about working with technologists is that they respect data and facts, and are very keen to make things better! As I had collected a lot of data that showed where our problems actually were, it meant we focused on fixing the real problems rather than working on assumptions.
For example, the data showed that we were deploying far fewer kits to our test staging environment than we thought we were. As a result, we dealt with the problems that were stopping us from deploying them easily.
Stabilize the release – not everything has to be deployed now
One problem was that often when we wanted to pick the release candidate for the week, there would be last minute “urgent” changes that teams said must be included.
Having discussions around if something was really important enough to release right now was hard. We had around eight teams at the time, so this was a real problem for us. It meant:
- delaying selecting the release candidate with knock-on effects on the release cycle
- rushing the work so it could be included in the release with knock-on effects on quality, and possibly impacting later releases
The stress and urgency was being caused by only being able to deploy to production once a week. Until we could deploy more frequently, we would continue to have that weekly stress, unless something changed.
I started to ask questions like:
- “Why does it have to go out in this release?”
- “What happens if we don’t release the change this week?”
- “Can it wait until the next release?”
It was quite hard having these conversations as it was already a stressful situation for the teams. I may have been viewed as preventing them from getting their changes out. It did feel counterintuitive to be saying that a change can wait another week when we were trying to release more frequently. The key thing at the time though was to stabilise the releases.
By having the discussion and asking those questions, we realized that some of the changes weren’t urgent at all and therefore didn’t need to be rushed. The overall effect of that was our releases were smoother, with fewer patches to fix escaped bugs. Obviously, there was less stress too!
We could then focus on releasing more frequently.
Use data to identify bottlenecks and change habits
Changing habits is hard. A habit is something you automatically do. You have to work at stopping doing the old habit and replacing it with a new one.
Publishing data that highlighted the problems in our release cycle helped create an acceptance of the problems, plus a will to fix them.
We tried a few things to help us form good habits. For example, we had hours of delay due to the people involved in the release process sending emails as the primary communication method. An email would get sent, a message was communicated, job done! However, until the recipient has read and understood the email, you haven’t communicated anything. If they are in a meeting, or only check their emails once or twice a day (a good habit!), then it could be hours before they see it. To quote a friend, Rob Lambert (@Rob_Lambert), “communication is in the ear of the listener”.
To change this email habit, we introduced physical handovers for a while. We bought a Hollywood clacker board, wrote the release candidate version number on it, then passed it on to each other like a relay baton. If it was your turn to have the clacker board, you knew that the whole release cycle was waiting for you to do something! That’s quite motivating to actually go and do whatever it is you are supposed to do! It also helped get that sense of the release going out being the most important thing. (Photo by Jakob Owens on Unsplash)
There was another purpose to the clacker board though – to get people talking face-to-face, to get to know each other and start to feel like a team.
It was gimmicky, fun, and made us laugh. It did the job – it helped us eliminate the queues in our process caused by poor communication, made us work as a team, and helped us form a much better communication habit.
In summary, work on replacing an ineffective habit with a more useful one – be creative and find fun ways to get people involved. Habit stacking is another good approach, which is where you add the new habit you want to create on top of another good existing habit. I highly recommend Helen Lisowski‘s “Power of Good (Agile) Habits” workshop, if you’d like to know more (@HelenLisowski).
Experimenting with the Toyota Improvement Kata
When I first started observing the releases I didn’t really know about the Toyota Improvement Kata! I’m lucky in that I work with very experienced Agilists, Lean and System thinkers so I soon learnt. As it turned out, my whole approach of observing, collecting data, and analysing it in order to know where to focus our next experiment was, in effect, the Toyota Kata
The Kata is about applying a scientific style of thinking to understanding your problem, what you think will happen next, and adjusting your next steps based on what actually did happen. It has four main steps:
Step 1 – Set the direction
Set the direction you are aiming for, your challenge, your goal, your true north. For us this was release on demand. That doesn’t mean we release every minute; it means that we can release in whatever cadence we want to in a smooth and frictionless way.
Step 2 – Understand your current state
Know your current state, your current condition, your baseline. This is where that data we collected and analysed came in. We basically had a Value Stream Map in spreadsheet form.
Step 3 – Decide your next milestone
Establish your next target condition, your first milestone. Often it’s too big a jump to go from your current condition to your end goal, otherwise we would have done it already! In order to make progress, it’s helpful to identify an intermediate goal that is more achievable. For us this was reducing the release cycle from 4.5 days to 2 days.
Step 4 – Conduct experiments
Decide on and run experiments to get to your next milestone. This is where the data analysis came in again. The data showed us our biggest problem areas and where our queues were. I think of queues as dead time – nothing is happening, we are just waiting for the next bit of the process to happen. We focused our first experiments on eliminating, or at least reducing those queues.
Using the Improvement Kata way of thinking helped us pinpoint what change we wanted to make next. It resulted in us changing some habits and automating manual steps. For example, the clacker board experiment to improve communication I mentioned earlier, came about via the Improvement Kata.
Another change that came about via the Kata was automating the deployment of our green kits to our test staging area. It sounds like an obvious thing to automate, and it is. At the time though, we thought that it was broken builds that were mainly holding up deploys to staging, rather than the deploy being a manual process. Collecting objective data that showed our current state (step 2) highlighted that actually, we weren’t deploying most of our green kits at all! Being a manual process, it suffered from people not being available, or people being distracted by something else. Automating that deploy became the next action (step 4).
Benefits from improving the non-tech side of releases
The obvious benefit has been the reduction in cycle time. With a 4.5 day cycle time, we just couldn’t release more than once a week. After running several experiments and of course, working on automating the delivery pipeline, we more than halved that and could release in a day if we pushed ourselves. Some other important benefits include:
- Better communication and working relationships
- Improving our worst performing automated tests
- Reducing the cost of manual regression from 6-8 people for up to 2 days, to 1 person for around 30 mins
- Fewer patches to fix escaped defects
- Automating deploys to the test staging environment meant teams finished their story testing more quickly. This reduced cycle times and improved the feedback loop. Although a lot of testing was done in the development environments with exploratory testers working closely with developers, some testing had to be completed in our staging area
Some lessons learned
Here are a few things I learnt:
- Having data that shows patterns and trends counts. It means you can show where the real problems are and also show how the changes you make improve things, or don’t as the case may be.
- Objective data and analysis is very useful when trying to convince management to invest time and resources into a project! Those graphs are powerful!
- Whilst CD efforts are primarily focused on technology and automating pipelines, the people side of things can also have quite a dramatic effect on your cycle times. Find out where your bottlenecks are.
- Open plan offices and easy access to fellow colleagues involved in the release process does not mean they communicate well or work as a team
- I don’t have all the good ideas! Ask for help, get your colleagues involved (thanks Gemma Lewington for the Hollywood clacker idea).
It’s easy to get caught up in the technical side of continuous delivery and only focus on that. After all, that’s where most of the advice and skills development resources are focused. It’s obvious, and of course it needs to be done. However, do take a look at your overall release cycle.
Depending on where you are in your pipeline automation journey and how long that journey is going to take you, there may be other non-tech factors hindering your releases in the meantime. Understand all the steps in your release cycle, find the bottlenecks and queues, make sure your communication methods are effective, and that all the people involved are genuinely working together well.