Automate smarter, not harder
Credit: https://unsplash.com/photos/zoCDWPuiRuA

Automate smarter, not harder

It’s easy to think that automating everything makes problems go away and things will run smoother.?

I’ve been automating infrastructure and software and many technologies in between for more than 10 years, and it’s still painful. We simply traded boring checklists for complex scripts with varying code quality, provider breaking changes, and conflicting version dependencies. Sound familiar? You’re not alone.

I was showing another engineer how my automation works and I found myself apologizing that it wasn’t as automated as it could be. We got talking about how it was automated enough for the problem we were solving and it didn’t need to be perfect. We have all of the manual steps thoroughly documented so it just takes a little extra time after running the scripts, no big deal.?

Automating things is hard and it takes a long time to write a script, much less a scalable and sustainable one across multiple disciplines or services.?

Let's do the math

I love cocktail napkin math, so let’s do some quick calculations on how you can automate smarter, not harder.

Let’s assume that you have written a checklist of 100 things that need to be repeated every month. If you did it manually, it will take you 10 hours each time (6 minutes per task). In an ideal world, you would want to automate all of it.

Now consider that there are 5 different services and systems that you are automating and one of them is very difficult to write scripts for. Is it worth it? Probably not, so let’s assume 20 manual tasks (2 hours).

Let’s also assume that a few steps for the other systems are difficult to automate, so let’s assume you can automate 18 of them but have 2 manual steps for each. That’s 72 automated steps and 8 manual steps.

If you were taking a test in school, a 72/100 is a C-. That’s not a very good grade and certainly a disappointment to the students that work hard to get straight A’s.

The important thing here is value. Does 72% provide more value than 0%? Absolutely! You still saved 7.2 hours per month.

Does it provide less than 100%? On your cocktail napkin, yes. But ask yourself, what is the return on investment and opportunity cost? Time is money, and engineering has an exponential scale of difficulty cost that you have to consider.

Calculating the return on investment

If we assume that each task took 30 minutes to automate, or 6 minutes to run manually, then you would need to run the script 5 times to break even. Your ROI is 5 months. This ROI is a lot sooner if multiple people are doing the same work.

For those tasks that are harder to automate, let’s assume each one takes 10 hours of research, development, and debugging. This assumes things go as planned and you don’t end up down a rabbit hole for 3 days (or 3 weeks...).

That one task takes as long to automate as running the entire task list by hand, and is certainly tough to justify. Not only would you need to run the script 60 times (10 hours / 6 mins) to break even (5 years), you would need to spend 280 hours to build that automation now. There's not many people with 7 weeks or uninterrupted, un-prioritized time to automate these hard tasks in a script, myself included.?You have to pick and choose your battles.

Now that we’ve gotten past the reality that it’s hard or nearly impossible to automate 100%, let’s evaluate the 72% that we could automate in 36 hours.

Do we need to automate it all at once? What if we wrote a script for each service? Spending 9 hours to automate each service sounds reasonable. Do you need to integrate the scripts together? Or can you simply run each one after the other on your command line? You can chain them with `&&` and create an alias in ZSH if it bugs you that much.

That’s great, but what’s the point?

It’s okay to automate in smaller chunks. You don’t have to automate everything at once. You don't have to automate the things that you traditionally think are the things to automate. And you don’t have to automate it all.

I love working at GitLab where we live our values of efficiency (for the right audience), boring solutions, iteration, and results.

I personally look at automation in 1 week chunks. If I can’t automate it in less than a week or two, we need to break it into a smaller bite size scope.

Looking back at what I’ve automated in my time at GitLab, all of my sustained automations are actually the ones I wrote with the least amount of time available (<5 hours usually). The Friday afternoon bug fixes, the incident response retrospective runbooks, the chained alias I use with Kubernetes to perform garbage collection on stale namespaces from GitLab CI jobs when the cluster has performance problems.?

They are all boring solutions, none of them are pretty per se. However the goal isn’t to have them be elegant and perfect, the goal is for you to be able to use them to get stuff done efficiently. Make it work in version 1, you can make it pretty and scalable in version 2+. You get bonus points if you can write good documentation for your team members to follow it (hint: less on-call notifications and better vacation coverage).

Infrastructure Example

At GitLab, we do a lot of infrastructure-as-code automation. Each team is responsible for their own environment so you end up automating what matters to you, not necessarily everything that could be automated.

For our demo and training environments, I have a template for how we deploy the infrastructure. I've automated our Terraform deployments with GitLab CI. We have Ansible and Helm charts in source control but run it locally.

We could certainly automate this further, but I don't do it enough to have a justifiable ROI so I wrote an extensive installation runbook with all of the manual steps we haven't automated. The runbook takes approximately 3 hours to perform the steps, and that's good enough for our use case when we need to run it two or three times a year on average.

Back Office Tools Example

At GitLab, our IT team handles a lot of provisioning with various applications and getting data about existing provisioned users and entities. We created a collection of CLI scripts with Laravel that can allow us to get the data in our terminal using API calls.

We have underlying API response formatting helpers that make it easy to extract this data into any format we need (JSON, CSV, Google Sheets, etc). It might look simple, however we were shocked how many people struggle with a spreadsheet VLOOKUP when running reports so we use our API calls and do the cross-table data manipulation in our script.

This may not be traditional automation, but it is still a quality of life improvement that we use daily.

No alt text provided for this image

API Abstraction Layer Example

When you make your first API call, it can be a "kid in the candy story" reaction with how much potential you see for future automation.

After you've spent a long time making API calls, you will see a lot of frustration with varying vendor standards of API response formats, pagination, error handling, etc. This creates a quality of life challenge for engineers, and API integration becomes a very specialized skill set that can create a bottleneck for building upstream or downstream automations.

In GitLab IT, we continue to open source the SDKs for every vendor that we integrate with for the benefit of the Laravel and PHP community.

We're opinionated and have created a standardized API response format that we want to use universally with all of our vendors that we integrate with.

No alt text provided for this image

This allows us to know that the json and object keys always have the right level of API response without worrying about nested arrays or other weird keys that we need to loop through.

No alt text provided for this image

What if I don't want to automate it?

Most runbooks at GitLab are Markdown files in various team repositories, some with nothing more than a numbered list without a heading. It also helps to become comfortable with unpolished internal automation and documentation. Accuracy and effectiveness are the important parts.

It doesn’t need to be complicated, just write down the steps or record a video of you doing it. Your future self will thank you.

Idealistic vs realistic automation

I was looking at our backlog and more than half of the?issues are related to quality of life automations based on problems we’ve had in the past. The interesting data point is that many have been open for 6+ months and the issue hasn’t been a recurring problem. It’s idealistic to automate them, but it’s not realistic. I’m going to add a label for wishlist automation and close the issue. It’s one button to open the issue later, and it clears the headspace to focus on current problems.

I suggest automating your pain points and the things that make you pour a drink every time you have to do it, then automate the things that you find yourself doing a lot. Don’t worry about the rest, just write some good docs so your short term memory can thank your past self for documenting how you solved it last time.

Half of my personal runbooks and task lists are comments on issues that I added screenshots or terminal output to, and I ensure it has a few keywords so I can search my Gmail inbox for the issue notification months or years later. That’s better than any enterprise search platform will give you. This also allows you to link your docs or notes when someone asks how to do something for the 5th time and you’re tired of doing it yourself or showing them how.?

At the end of the day, automation is hard, but task lists and runbooks are relatively easy. If you don’t automate it, at least “code in English” to write the steps down that aren’t automated.

Many of my scripts are procedural in nature instead of object oriented since I can copy my task list into code, make the tasks comments and write to code snippet underneath with the few lines or functions needed to implement the automation. After 18 months we just iterated on some of my old code and finally replaced 25 lines of procedural script with 600 lines of object oriented functions, not to mention unit tests.

Make it work, use it, then make it better when the time comes.

Remember, any automation is better than no automation but it’s a nice to have and not a need to have. You don’t have to automate 72% or 100%.

Start with automating 10% and take it from there with a minimum viable change that adds value and saves you time. It’s okay to stop at 30-50%. You’ll know when it’s time to invest more in your automation when you start to feel pain points.

You may be asking when you outgrow this thinking? I’m currently leading a project that automates ~500+ hours of manual work each month for our team members. It is estimated to take 1,250 to 1,500 hours to complete the project, so that’s 2.5-3 months for ROI and the efficiency gains from the opportunity cost for dozens of team members that perform these manual tasks. We're in the process of challenging our thinking and trying to reduce the scope by integrating more vendor solutions, and it's too early to say whether that will be the right answer or not.

Regardless, the approach is the same with phased iterations, there is just more thought going into the foundational architecture of the codebase to ensure the automation is sustainable and has rock solid security and audit logging.?

The difference with this level of investment is that there is an opportunity cost and you have to consider whether buying a solution will be a better investment than building one. If you buy it, you get 60-70% of what you need. If you build it, you can get 90%+ and have the source code to engineer your way out of any problems.?You can also build the remaining 20-30% plumbing but there are a lot of trade offs that need to be considered.

It’s always okay to ask whether to build vs buy

It’s always a good rule of thumb to let the engineers make technical decisions and let the leaders and managers focus on time, people, and money problems.

There are many good reasons to use vendors, since they solve the problem you have for a living; just don’t make the decision without extensive technical validation by technical team members.

If there is a top-down decision to choose a vendor “because all the other big companies use them”, your organization will make a tragic mistake if you haven’t found a solution that meets your organization’s specific needs.

The latest generation of software vendors provide APIs and no code integrations with many providers, and it looks great on the surface. The challenge is that nobody can do everything so you have to thoroughly analyze whether the platform you choose allows you to build those last mile integrations, and if it’s an experience that your developers love.

This is a personal choice and preference, but be careful about picking the “big household name” products since they usually are actually over engineered and/or frankensteined together with product acquisitions and customer feature requests that convolute the elegance of the underlying architecture, usually to the detriment of end user experience or back office workflow efficiency.?

You can usually get part of the way with vendor automation, just realize that you will always introduce more work and cost by adopting another company’s product and their own fallacies as something you have to live with that affects employee morale more than you might think.?

It’s important to build a relationship with any vendors that you’re going to be deeply integrating with since you’re embarking on a journey together for many years to come. Study their features, deep dive on their architecture, get behind the scenes/curtain to see how it really works, understand their roadmap and feature development velocity.?

Remember that startups are focused on product development and big companies are focused on profit development. If the big company doesn’t have 90%+ of the features you expect them to have to meet your 70-80% purchasing criteria, they likely never will or it will be 2-5 years from now when it’s too late. A startup might be able to have the feature available in 2-6 months.

Startups are focused on building great products and enterprises focus on building great profits.

Find the best tool for the job, whether you build it or buy it. Just don’t make the mistake of buying a big logo just for the logo and industry recognition as a "leader", buy a right sized solution that has the right technical capabilities.?Many times the right choice is the earlier stage company that you will grow with since they can accommodate your needs better, and there's a good chance you'll have a better relationship and have more fun in the process.

Automation is hard, but there's no reason it can't be fun to work with others to help automate your business and improve your quality of life.

James Sandlin

Infrastructure Automation Architect

2 年

"We simply traded boring checklists for complex scripts with varying code quality" - I thought this would be fixed by us moving away from PERL?!?

回复

要查看或添加评论,请登录

??????? Jeff Martin的更多文章

  • An Engineer's Independence Day

    An Engineer's Independence Day

    Most people have traditions for the holidays they celebrate, usually having a day off work to hang out with family and…

    1 条评论
  • The 5 Ps of tech career advice

    The 5 Ps of tech career advice

    I’ve started a new role as a Senior IT Systems Engineer at GitLab. This is a lateral move for me from the Sales…

    9 条评论

社区洞察

其他会员也浏览了