Infrastructure : A Customer First Approach
Image Courtesy : https://thumbs.dreamstime.com/z/electric-switch-web-icon-28957931.jpg

Infrastructure : A Customer First Approach

TL;DR: DevOps are now integral to any Engineering team that wants to optimize the time it spends "fighting the plumbing". Here I talk about how a customer centric approach is critical to DevOps success inside an Engineering team. Your customers are your engineers and infrastructure teams should ignore them at their own peril.

I feel the need, the need for speed! 

The Un-Expected Duct Tape

As an engineering leader, an aspect of hygiene is how much time your team spends on “fighting the plumbing”. I want my engineers focused on solving customer problems and delivering value constantly to the organization. One aspect of delivery is just the design, coding and quality part. The challenges here are around having the right environment setup to develop and test the feature, data tends to be a challenge in this phase. The secondary aspect is the actual deployment and "production readiness" of the feature. 

Production readiness gets us into :

  • Scoping and understanding performance and scalability requirements
  • Ensuring the product runs on the right hardware spec
  • Embedding the right monitors to ensure that we have eyes on the system when things go south
  • Ensuring that the quality team is able to test in a “near-production-like” environment so that risk is mitigated earlier in the cycle

 

Here are a few themes which you might have heard:

  • It’s taking me a long time to setup my environment to develop this feature
  • This feature requires a lot of configuration, so just test it against my machine instead
  • I can’t test the feature till Engineer X finishes her testing on our shared environment 
  • We missed the bug because the feature environment was not updated

…and so on. 

The part about how software gets tested and gets to production is so critical and optimizing this part to be drama free becomes very important. Often these steps are held together by duct tape type solutions. Till you have a dedicated infrastructure team or are attributing enough cycles to this process, the steps remain a bit clunky. Attempts to improve them are ambitious projects which often overlook adoption and team biases which will cost you time, effort and morale!   

Adoption or Death by Combat!

These are quality problems to solve though. You want to go fast? Get these roadblocks out of the way first. There are so many ways to crack this problem and here is where I’ve seen teams getting lost and then ultimately embark on Siberian death marches. You got to watch for this fork in the road. Talk to your engineering team and don’t assume that everyone is super motivated to embrace DevOps and run that twenty step process each time they need to spin up a new environment.

Figure out honestly where your team stands and what the level of adoption might be. If you have a small and focused team where the team has been exposed to a DevOps culture, then adoption is going to be much easier. However, in most teams, there is probably just a fleeting knowledge of how a DevOps culture is for real. You really have to live it to know it, reading about it does not help beyond a point.  

Pick the path of least resistance, make it super simple for your team to embrace the infrastructure changes that are planned. However, what tends to happen is that the infrastructure team will ideate and get things working. Then, publish large documents / Wiki articles and then follow them up with talks showing off what was built.

Furthermore, every developer must now setup the tools and environment needed to use the “new” way. The rollout is not thought through and when people start using the “new” way, things break and fixes are non-trivial and hold things up. However, there are deadlines to meet and soon, people will find ways around the fancy process you setup. 
Ultimately, your team is going to give up and circumvent the changes. The infrastructure team feels cheated because they think the Engineers are ruining their big moment and ultimately its a lose / lose for everyone. If this sounds familiar, then you didn’t think about your team before you rolled something out.

You forgot to make your infrastructure “production ready” and treat your team as your customers. It’s also educational because it’s a great case study on rolling out ill thought out product changes that we just assume that our real world customers will use without question.

Never be presumptuous about our customers, that has costly side effects. 

Infrastructure As Electricity

In one of our discussions about infrastructure, someone in the room said “Infrastructure should be like electricity, when you turn a switch, you never doubt that power will flow through”. That characterization of infrastructure has stuck with me. Sums up exactly how we need to approach the adoption of infrastructure changes in our teams. It’s got to be as easy as flicking a switch. 

The Heroku model is very nicely done, simplicity itself. Push your code =  deploy your code. Now, that’s a model to aspire to. I’ve had good success with using a Chat-Ops approach. We created a Slackbot which we built using Hubot and got the whole team to use a dedicated Slack room to manage our deployments. In the back-end we started by implementing a v1 that we could deploy quickly using some nginx magic so as to use a large box to power all our deployments. The system worked out great. Our ultimate goal is to docker-ize the setup, but  all that would be transparent to the end-customer, our engineers.

Different environments were created to mimic various degrees of production separation. We started out sharing databases and caches, but those could be hived off easily as well using a per environment configuration management store. The environment was hosted on a powerful shared server and idle processes didn’t take up resources forever. Separately, we were scripting our environment with Ansible so as to stand it up easily. Once this type of setup is operative, any issues are P0’s because your team is going to be blocked. 

Adoption was a breeze, because there was nothing to install or learn. it was as simple as sending a chat message. "deploy this branch to this environment.” We hit all our goals of making deployments into environments faster, making the process like “electricity” and dropping our AWS costs that were applied to setting aside machines to host test environments. More importantly, the infrastructure team was free to evolve the back-end at their own pace without materially affecting the team. We’d just add more options to the chat-bot and the team would be none the wiser. If anybody was keen on figuring out how things worked, they did it out of their own interest rather than being forced to understand the nitty gritties, just to use the system. 

Flick that Switch

You’ve got some great tech to pick from - Docker is awesome can really speed things up, but you got to know what you’re doing. Docker adoptions require a  a good thinking through. It’s better to get your process down first and then migrate to Docker, in my opinion. Ultimately, regardless of the technology you pick, be it Vagrant and images / Base AMI’s driven by a CM system or Docker, the pattern is pretty similar. 

The meta point is to always think about how it’ll be used first and then work your way back to the technology that will power the experience. Infrastructure done right is like a drug. Once your team is used to it, you’ll wonder how you ever got anything done without it! So, always think about your customer first and then work your way back to the technology bits, you can always iterate and improve on the engine that nobody sees, get the interfaces right first! 

Abhijit Muthiyan

Smart City Innovator | Empowering Urban Management Through Integrated Infrastructure Solutions

8 年

Well said! Often, internal QA and engineering infrastructure is second tier. Most of the energy is being spent on upgrading and maintaining customer facing infrastructure. Without proper test environment, we let customers find bugs .

要查看或添加评论,请登录

Akash Saxena的更多文章

  • Failure Engineering - API Edition

    Failure Engineering - API Edition

    Introduction The smallest crack in a mighty dam can bring it down. Just like that small crack, foundational pieces of…

    3 条评论
  • Be Memorable

    Be Memorable

    Introduction It’s been an intimidating experience to think about what to say today?—?this is the first time I’m…

    6 条评论
  • SRE Playbook - Step By Step

    SRE Playbook - Step By Step

    I say SRE..

    11 条评论
  • Observability — That Last 9

    Observability — That Last 9

    TL;DR: A stitch in time, saves 9. Discussion on key blocks of observability.

  • Value Streams - Notes on Planning with OKR’s

    Value Streams - Notes on Planning with OKR’s

    TL;DR: Planning is hard, what is helping lately is to zero down on identifying value streams, ascribing a metric and…

    4 条评论
  • Cricket & Agile Software Delivery

    Cricket & Agile Software Delivery

    Tldr; Ever since the Indian men’s cricket team pulled off an improbable, once in a generation heist, I couldn’t help…

    11 条评论
  • Scaling the Hotstar Platform for 50M

    Scaling the Hotstar Platform for 50M

    TL;DR; Hotstar is the home of Indian cricket and scale. However, it’s not rocket science, we did use some rocket…

    7 条评论
  • Scaling Is Not An Accident

    Scaling Is Not An Accident

    TL;DR; The entire Hotstar team has spent the last 6+ months getting ready for our marquee event, the IPL on Hotstar…

    7 条评论
  • Daring — Culture Tenets @ Hotstar

    Daring — Culture Tenets @ Hotstar

    TL;DR; At Hotstar, we are building a very special Engineering team. As we grow in strength and surround ourselves with…

    8 条评论
  • Locks In the Time Of Lock-pickers

    Locks In the Time Of Lock-pickers

    TL;DR; How much security is enough in a world where the malicious agents are always devising newer attack vectors to…

社区洞察

其他会员也浏览了