The Three Thieves in Scaling Up RPA
Every manager would know the challenges of managing a five-people team are very much different from those in fifty-people department. But rarely we've foreseen the failures and issues that seem so minor and manageable when there are few machines running in production get compounded and amplified into daily emergencies as the number of machines grow into three digits. If, in the stage of doing pilots, the RPA program survives on a few RPA champions, scaling an RPA team requires as much teamwork as discipline.
As if moving a project from discovery to deployment is not hard already, seeing the truths in how deployed solutions run and change over time can be a test of nerve and endurance. Just as there is silver lining in the cloud, in the chaos emerges an opportunity to evolve your RPA team with stronger and more efficient practices. In retrospect, there are Three Thieves that my team and I constantly had to battle with in the early days. The good news is we are much calmer and move more swiftly in the face of them than we were more than a year ago, the bad news is those days are not over yet. The Three Thieves that I suppose every team will have to face and their success depends on how effectively they can deal with those are Just Working Code, Unplanned Work and Environment Failures.
JUST WORKING CODE
In the engineering profession, there are situations practitioners have both incentives and pressure to churn out projects as quickly as they can and with questionable standards and qualities. With limited resources and skills to police and audit from your team throughout different phases of development, the problems eventually reveal by itself in production. Left unaddressed, they, at their best, deprive your team of the target benefits of automation and, at their worst, suck your resources into the mess of code terminations, piling inventories and missing Service Level Agreement.
While it is very easy to see the person causing bugs as evil, beware that the evils can be from your vendor, your colleagues or even you. One proven way to deal with the worst of everybody is to design a system that utilize everyone's best. Having a group of engineers that comprises of the most experienced and diverse roles to oversee the design and configurations before and throughout development is beneficial. It is crucial not only in detecting Just Working Code early but also to keep your whole team updated on new technical and design problems, hence having chance to addressing them early and collectively. While I leave the topic of supporting and maintaining Just Working Code already in production out of this discussion, I discussed to some extent the practices of Design Authority in How to review codes in Blue Prism.
It is human nature to not act on problems perceived to happen in the future, therefore, having an evolving and strong Design Authority team is an effective way to hold everyone accountable and stand up to the pressure of moving projects quickly. While there are many ways to design a team, in my experience, having a diverse group of people in the team is important. There should be a mix of old and young engineers with a culture of absolute honesty and thoughtful disagreement (Principles, Ray Dalio). Rotating people in and out of this group is also beneficial to growing the team talents and inject fresh perspectives into the group. In essence, the team should evolve to stay relevant to the current needs and challenges of growing RPA team.
UNPLANNED WORK
As mentioned in The Phoenix Project (Kim, Behr and Spafford), there are four types of IT Operations work: business projects, IT Operations projects, Change Request and Unplanned Work. Surprisingly, management often fails to factor in Change Request and Unplanned Work into calculating their capacity, thus fails to plan their resources adequately. Amidst the chaos and in the absence of clear operating procedures, resources often revert to firefighting mode and succumb to requests from whoever is loudest, most senior or has the most money. As a result, constant firefighting leaves resources burnt out, demoralized while the root causes are still left unaddressed, let alone the piling change requests that makes business partners frustrated with automation outcomes.
One of the core tenets in the novel is that unplanned work is expensive. Left unchecked, technical debt will ensure that the only work that gets done is unplanned work. Also, when all the team do is react, there's not enough time to do the hard mental work of figuring out whether you can accept new work. "So more projects are crammed onto the plate, with fewer cycles available to each one, which means more bad multitasking, more escalations from poor code, which means more shortcuts. It's the IT capacity death spiral."
To counter the nightmare of unplanned work, it's more than just being able to say No and concentrate resources to where the highest value work are. It's more about a meritocracy culture where team members looks at the issues holistically, perform root cause analysis and back up their decision with data. It's also about doing smaller, faster deployments to production to have a shorter feedback loop. It's less about the brilliance of an engineer than about figuring out protocols to decide, communicate and work. In another word, effective collaboration and sound IT management matters much more to Production Support than any other groups in an RPA department.
PLATFORM FAILURES
In manufacturing, everybody knows that one factor jeopardizing on-time delivery is machine breakdowns. So, to mitigate that risk, you'd create an SLA for machine to do preventative maintenance once per quarter or month.
"Machine breakdowns" in RPA comes in a multitude of forms such Blue Prism server down, environment slowdown, business applications down. Depending on the nature of the breakdowns, they can halt the entire production platform or entire processes. If there are little to do prevent the breakdowns, team should at least invest in ways to detect them sooner and making the team well-trained on what to do in case of breakdowns.
While some of these outages and downtime are unavoidable, some are signs of chronic, deep technical issues. Therefore, team should not ignore (hire an infrastructure specialist...) or take actions to minimize the costs of breakdowns in the future (create a forward looking KPI of application changes, schedule maintenance work, modify configurations...).
* * *
One last thought to end this note: no matter what the objective of your RPA program, no matter where you are on the journey, spend some time thinking about how your team can better combat these Three Thieves. The pressure to go big and go fast is real and the courage to take small actions can go the long way. Just like in Atomic Habits (James Clear), forming good tiny habits can deliver superior results. Making small but right changes to the way team work can smooth your way of scaling RPA within the organization. And perhaps the lesson and framework learnt from scaling up one initiative within an organization can help you scale up another.
Sincerely good luck!