Reducing TOIL, calculating if it's "worth" the effort.
Kaushik Banerjee ( He/Him/His )
SVP| Autonomous & Accountable DevOps, APAC SRE Head for Trading Tech| Execution, Empathy & Unleashing Team's Potential| I help Organizations reduce TOIL ,MTTR & MTTD while Improving Resiliency & Reliability
This article isn’t saying something revolutionary or new. I am just trying to re-iterate again something that we all know instinctively but I thought putting them in numbers with "everyday" examples might help with perspective.
So we all know reducing TOIL is desirable and the right thing to do. Repetitive manual work not only saps the life out of you, but the associated variability of the human element also carries significant Operational Risk.
But too often just running the x number of commands manually in a sequence is just easier than automating it, especially if automating it will take a concerted effort of say 8-10 person hrs. We kind of make up our mind that doing something that takes 15 mins a couple of times a week, doesn’t necessarily deserve the 8-12 hrs of concerted effort to automate.
So is it worth it?. How do we calculate? Googling around I saw that there was already a neat chart by XKCD.
According to XKCD, the formula goes: "If you perform a task N time per day, it makes sense to spend up to M amount of time to get an improvement of Z, amortized over five years."
So let’s take an example to see if automating the below is worth it or not.
Manual “Workaround”
领英推荐
Run cmd A ( 2 mins ) > Capture output ( 1 min ) ?> Transform output (remove spaces/tabs/sort in certain order etc ) ( 5 mins ) and save to file ( 1 min ) > Run command B on file from previous step ( 2 min ) > Capture output > Create file and email out ( 4 mins ) ?= 15 mins.
So if we do it once every week/per shift = 45 mins per week = ~ 5 person-days in ONE year (If we use XKCD calcs then 25 person-days, as they are counting over 5 years ).
So if we could automate the above with an input of 8-12 person hrs, then we are saving time after the 3rd-4th month itself.
Now, look at cr@p alerts. Even if we know it’s of no use and we have to just close them, just the action of selecting and closing those cr@p alerts wastes ~45-60 mins per day/per region. So just getting rid of that noise (that can be closed blindly) will save ~1 person MONTH per year!!.
The 2K + “work” emails we get every day, out of which maybe 100 are relevant. Spending a couple of days to create better mail filters ( or exiting not-useful-to-you-anymore mail groups ) will again save a person month for each of us.