DevOps to NoOps: Embrace Algorithmic IT Operations in 2017

DevOps to NoOps: Embrace Algorithmic IT Operations in 2017

DevOps has altered the dynamics of infrastructure provisioning, managing applications build, and release processes. However, still at many companies, it is largely confined to configuration management and automated deployments, whilst a large chunk of day to day operational problems remains a sore point for engineers. Simply put, for all the chatter of AI and ML, even today, cloud engineers face a multitude of operational fatigue with an overload of data, adhoc scripts, tool chain and alerts to contend with.

So, here is a thought, “What if Humans could solve new complex problems while we let Machines resolve known, repetitive, and identifiable problems?”

The rise of cloud, distributed architectures, containers, microservices have further increased the data overload as different systems are required to monitor and manage the new age applications. And with the ever growing amount of alerts, toolchain and automation scripting is inducing fatigue into engineers work. In the ideal world, every DevOps engineer should be focussing on Apps instead of Ops .

As DevOps community, I want to share how we can deliver Algorithmic IT Operations (AIOps) in our companies to reduce the stress and fatigued workload by eliminating alerts, repetitive events, improve business agility through intelligent management layers, and respond quickly to production incidents 10X faster.

  • Adopt NoOps Philosophy:  It’s very important within the engineering teams to adopt a Culture of NoOps, which essentially means, saying NO to manual operations. It’s important to nurture a belief that “machines should solve known problems and engineers can focus on solving new problems.”
  • Deploy Automated Actions for Known Events: Anybody who managed production infrastructure, business services, applications and architected systems, knows that most of the problems are caused by the known events or identifiable patterns. The engineers already have an idea on what to do when certain events or symptom occur in their application or production infrastructure. You should encourage them to deploy automated actions (response mechanisms) for known events with business logic embedded so team can sleep peacefully and never sweat again.
  • Create Diagnostics for Operational Issues: When events or alerts are triggered, most of the current tools just provide a text of what happened instead of providing a context of what is happening or why it’s happening? So as DevOps engineers, it’s important for you to create diagnostic scripts or programs so you can get a context of why CPU spiked? Why application went down? Or why API latency increased? Essentially, to get to the root cause faster.
  • Use Code as a Weapon for Cloud Operations: The only magic wand for solving operational problems is to use code as a weapon for solving them. You can create everything from automated actions to diagnostics. As a team and DevOps engineer, you need to focus on using CODE as a mechanism for resolving problems. It’s important to encourage engineers to start applying algorithms for solving IT operational problems. If you are building the CI/CD today then you should certainly deploy a trigger as part of your CI/CD pipeline that can monitor deployment for health metrics and invoke a rollback if it detects issues. Simple remedies like this can save hours of time after every deployment and handle failures gracefully!
  • Adopt Intelligent DevOps Tooling: The world of using static tooling for deployments, provisioning, packaging, monitoring, APM and log management is over. With adoption of Docker, microservices, cloud and API driven approach to deploying applications at scale, and ensuring high reliability, requires a different take. So it’s important to use the intelligent tools for cloud management instead of trying to reinvent the wheel every time.I believe with rise of ML and AI, we will see more DevOps tooling vendors incorporating intelligence into their offerings for further simplifying the work of engineers.

Let’s say, the monitoring tool will use dynamic threshold approach for raising alerts based on history of observations instead of expecting users to configure threshold, Wouldn’t it be awesome for the engineers? I wish more vendors incorporate intelligence into their offerings.

At Botmetric, we are excited about working on building an intelligent event-driven platform for managing incidents and operations in the Cloud world. We are building Botmetric as a platform that can handle most of the operational problems for engineers using application discovery, alerts data, cloud configuration, historic patterns and known events. We believe, it will be a platform that helps our customers move from DevOps to NoOps philosophy by bringing Algorithmic IT Operations for incident management in the Cloud.

We are looking for passionate engineers to also join us in this journey, drop a note to careers@botmetric.com if you are interested!

Awaneendra T.

AWS and GCP Certified Solutions Architect | Cloud Migration and Modernisation | Enterprise and Cloud-Native Architecture (AWS, GCP, Azure) | DevSecOps, Continuous Delivery & SRE

8 å¹´

Nice Article. Thanks for sharing. NoOps kind of approach is definitely required in order to compete in the digital economy and make the day to day operations effortless.

Srikrishna Parthasarathy

Purpose-Driven Global Digital Transformation Leader | Business Value Creator | Enterprise Agility Coach | Cloud & DevSecOps Architect | MLOps | AI Enthusiast | Passionate Learner

8 å¹´

Nice article & vision for your company.

Nice article and nice summary. I think/believe most of the big players would have already built the infra which you have talked about. I am little skeptical about dynamic threshold configuration though. There are few scenarios where dynamic configuration is looks reasonable like anamoly detection in the service usage. Good luck Sir!

要查看或添加评论,请登录

Vijay Rayapati的更多文章

社区洞察

其他会员也浏览了