“Stop people working — are you mad?”
McArdle, M. (2013) Thatcher’s Economic Legacy (digital) Available at: <thedailybeast.com>

“Stop people working — are you mad?”

Presenting the problem

After months of firefighting IT problems, watching systems break all with a mixture of dependencies and fixing these same systems, the department needed to find a way of working that would reduce the outage time and increase new releases. So we quite literally pulled the Andon cord on our ways of working.

The Andon cord originates from Jidoka, a Lean Process principle. The idea behind this is a cord is pulled when help is needed, stopping all work to ensure the problem is fixed before it causes a bigger issue or complications in other areas and a mindset of continuous improvement from the teams. [Six Sigma Daily]

Pulling the Andon Cord on our ways of working

The idea of stopping everyone working on their projects to jump on an issue is a huge expectation, especially as Black Friday and Boxing Day were around the corner. It would take a lot of courage from everyone in the department to trial this, so we needed to really know what we were getting ourselves into.

Although created in the manufacturing industry, the Andon cord process has also been successfully used by the team at Amazon. When multiple defects are noticed by the customer service team, anyone can stop the sale of the product until the defect or issue is fixed. It’s so embedded into their processes it’s almost unnoticeable to the customers.

As a team new to DevOps, understanding how this method could work for our team was important, so read ‘Toyota Kata’ by Mike Rother, various online articles and watched a talk by Zack Ayers and Joshua Cohen about Andon Cords in Development Teams: Driving Continuous Learning.

The main takeaways from these great examples included:

  • Anyone could and should pull the Andon cord for any reason no matter how small.
  • The process should create a practice of continuous learning throughout all levels of the department.
  • Creating a culture of asking for help and collaborating with others benefits the wider team, not just the person who asked for assistance.
  • Stopping releases and production might initially feel like a bad thing, but long term it saves time, effort and money to the business.

Understanding the journey

Understanding these focal points helped map out our own particular journey. Visualising it next to the Toyota process we could compare them side by side and figure out what we really needed to begin with.

Working through our processes | Service map 2019.

Observing the teams through their working processes and over Slack channels, we could see a willingness to help each other solve problems no matter what office or product team they were in.

So in reality, we did have ‘a sort of’ Andon cord process in place (even if it did take on different forms for each of the product teams).

Another observation was the product teams highlighted issues using JIRA, not in collaboration with each other, but directed to their own teams for the completion/fixes. JIRA works well for managing project and report issues so we wanted to try to continue using this within the Andon System, as a platform it already works within the department.

Releases on the other hand varied because of the way they were communicated, some had CAB meetings ironing out the dependencies involved, whilst one of the teams set up a slack channel with automated notifications going off when a change was released. Teams had some idea about other releases coming up.

By combining these existing methods in how the teams release features, fixes and developments, the teams wouldn’t need to learn new tools on top of trying to follow a new process, helping us to revise and assess what was truly needed and how we could collect data along the way.


Watching, waiting and watching some more

After demonstrating some of the ideas and iterations we’d managed to carry out in the initial week, the Andon Cord process was put into action. To continue iterating on the ideas, we focused on understanding how it was used and if it was used differently by product team, who were the main people to use it and why they had chosen to pull the Andon cord.

It went silent. A couple of days in and the problems we’d seen originally, didn’t materialise as Andon cord pulls. The pilot we’d put in place hadn’t been too far removed from what the teams already did, so why wasn’t it happening?

We came away with the following assumptions from a short retro:

  • Are teams wary of using a new channel to ask for help?
  • Do teams know what they can use the Andon cord channel for?
  • Is the hashtag confusing the teams?
  • Most phones and computers automatically correct words so is this stopping an alert?

We ran another show and tell explaining the flow of what needed to be followed, adding into the slack channel a process flow to show how it should work and explaining there would be no judgement on what it was pulled for. We also changed the call sign to #pullcord so that there could be no predictive text changes etc.

No alt text provided for this image

Then we waited again…what followed was probably the validation of our earlier assumptions.

A discussion in the office about a test that just wasn’t going as expected. As the frustration grew over the Development test tills, someone suggested that the Andon cord was pulled. Listening, but not intervening in the discussion, it was clear that the product teams weren’t sure what they could pull the cord for. Eventually interrupting and suggesting they do it, even if it only received support from the Retail product team in the other office helped push the process forward and see what the outcome would be.

The discussion that followed in the swarm, quickly resolved the original issue, then something bigger came out of the conversation. The order of the other releases hinged on the completion of the test that was currently blocked.

The blockers discussed on the Andon call changed the priorities of each of the releases, so that the deadlines could still be met in time for Boxing Day. Further to this, they also managed to resolve other dependencies and support around the test which ensured this was also completed before the Boxing Day deadlines.

Understanding the impact between using and not using the Andon Cord process was fundamental to the trial. If the issue had not been swarmed on, the test would still have been completed in a couple of days. However this would have pushed other releases back, delaying a fix on the store tills until after Boxing Day. The cost to the business and the team, stopping what they were doing midweek was minimal in comparison to the potential cost of not releasing the important fix. This essentially re-iterated the idea behind the Jidoka principles and the Andon cord process.

A couple of days later we got feedback on the process of the cord pull and what impact it had on the teams and continuation of work. What we realised was that the teams felt documenting the issues and fixes would cause more work for themselves, some didn’t know if the call was from the Andon channel when it started missing the swarm completely.

Reviewing the processes

As a third iteration of the process we refined the Andon retrospectives into simpler Microsoft forms so that everyone could follow the same format, guiding them through what information was needed keeping the process to a minimum.

This seemed to gather better feedback on the next cord pull, giving the teams time to reflect on the fixes, but not getting bogged down in the detail or trying to explain what the impact was.

Next steps in developing this are adding in SMS messages to Product owners, Delivery managers and the data team to keep them alerted before looking into methods of tracking the releases and cord pulls coming through the department with a dashboard similar to those at Toyota.

We have something in place and although it’s not perfect, it’s a work in progress. One where we will hopefully evolve, develop and learn from the mistakes we make as we improve on our processes and hopefully document something all along the way, just so we know, what not to do next time.

First published on Medium. January 14th 2020. (6min read).


References

Ayers, Z. and Cohen, J. (2019) About Andon Cords in Development Teams: Driving Continuous Learning. (online) Available at: <https://www.youtube.com/watch?v=VCP_EU_vG74>

Hall, J. (2019) The Andon cord and ITSM’s DevOps challenge. (online) Available at: https://medium.com/@JonHall_/the-andon-cord-and-itsms-devops-challenge-78395393c56f

McArdle, M. (2013) Thatcher’s Economic Legacy (digital) Available at: <thedailybeast.com>

要查看或添加评论,请登录

社区洞察

其他会员也浏览了