Leveraging Artificial Intelligence and Machine Learning for IT Operations

Leveraging Artificial Intelligence and Machine Learning for IT Operations

Those of us who have seen JARVIS, Mr. Tony Stark’s personal assistant (IRON MAN) and his flying-suit achieve incredible feats together get quite a kick from the very idea of some sort of AI-entity / Bot being able to assist in daily life and make our overall life comfortable. I personally find the concept and potential of such an AI assistant very stimulating intellectually.

Few quarters ago, I and my colleagues started exploring how Artificial Intelligence (AI) and specifically Machine Learning (ML) capabilities could be embedded into running IT operations. IT Operations in modern enterprise is generally associated with keeping the services up and healthy for the end users consumption and always available to application developers for bringing creative ideas to life. The impact of degraded IT services or unplanned outages which range from dissatisfied users to significant business and reputational impact.

Some of the use cases we wanted to explore are the ones which our enterprise clients struggle on daily basis. These problems converge around keeping up with increasing complexity, up to date & accurate knowledge of what is changing, keeping the support professionals and operating procedures continuously up to date with changed environment variables. Existing approaches on getting the right information to the right support entities fall short on scalability, speed and accuracy. By Incrementally embedding machine learning insights into IT operations, we have come to appreciate the potential, power and limitations in bridging this knowledge gap towards assisting in IT Operations and have been humbled.

The potential of Machine Learning has already been proven in finding patterns in real time through the aid of supervised and unsupervised learning algorithms. Armed with finding patterns, preferences and anomalies invisible to naked eyes and static tools, several extremely successful business (Netflix, Amazon) have emerged where the enterprise core competency is based on learning from data for which the business value was unknown. Some obvious example are– What movie I might like (Personalization and Sentiment Prediction), what product to buy (Prediction), what alternative products / movie I should consider (Recommendation), which ad to display (Personalization), what others are buying/watching (Recommendation), what should be the price (Maximizing profit) - The use cases of where machine learning is making a difference goes on and on.

The point I am trying to make is there is a method to this madness of recommendation, prediction, and personalization. When combined with automated orchestration have the potential to create a platform for autonomous IT Operations. Easier said than done.

The cardinal rule of IT operations which still holds true is every Incident is result of a change. The changes could be planned, unplanned, authorized and /or unauthorized. With increased adoption of Agile development and DevOps, today developers and administrators are working much more closely with business resulting in a world where change is the only constant. The sooner we find what changed and determine the nature of change - we can accept the change as new normal or even revert the change autonomously to get back to business as usual. Either ways we can significantly improve our diagnostics and incident response capability.

Some of early success we have through embedding machine learning in our operational engagements are

  1. Ability to detect & predict the impact of the changes based on historical data before the change is actually carried out. Somewhat like a warning or a reccomendation to the person carrying out the change with details of impact and probability of success / failure etc.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
  2. Detect Anomaly / Predict Failure/ Service Degradation / Capacity Prediction etc by analyzing and correlating millions of events across heterogeneous sources in real time.
  3. Detect the nature of change, what changed, who changed it, what triggered the change etc – essentially cutting the time to diagnose and eventually restoring the services back.

While these insights are promising, it has become evidently clear to us that without the strong foundational capabilities in IT Process orchestration which is broadly adopted within the enterprise, the recommendations and insights provided by ML are not going to be of much operational value. Essentially we need to find the way to build the bridge between RPA, ITPA and AI/ML capabilities. Next few years will be very exciting for professionals who would be willing to dive into the world of data and experiment with embedding machine learning algorithms and commit to build these bridges for humans and intellegent machine learning algorithms.

Enterprise solutions with embedded machine learning for IT Operations are starting to emerge and are almost ready to be put to test in today’s enterprise. It’s easy to get overwhelmed with the complexities of operational aspects and classic scope creep of ML implementations – being focused on the use cases and outcomes seems to always work. I’d encourage the exploration nevertheless with a belief that this is going to be a classic marathon and not a sprint.


Anil Sharma

Technology Leader | Cloud, IT Infra, IAM, Cybersecurity | Director@Capgemini

7 年

Interesting article Santosh Dubey!

回复
Akshaya Bhatia

Regional Mentor of Change at Atal Innovation Mission (AIM), Niti Aayog.. Certified Gem of Mentor India 2022, 2023 by Niti Aayog (Government Of India) ...

7 年

Nice info, good innovation, nicely articulated, thanks

Santosh, great article! AI is a way to go when most of operations (provisioning, deployment, recovery, resource allocation etc.) is automated. To make automation manageable and reliable you need checks and balances provided by ITOA. Considering that change is a main object of automation, change analytics is a key component of ITOA. Speaking about existing enterprise solutions, we in Evolven already deliver ML based change analytics technology that is successfully used by largest IT organizations in the world. I would argue that there are numerous ML based ITOA solutions in the market that are ready for enterprise implementations.

John Gonsalves

Global Sales & Solutions Leader | Digital Transformation | Board Advisor | CXO Mentor | Angel Investor

7 年

Nice article Santosh Dubey! Love the articulation and application of machine learning and AI to automate IT operations by detecting anomalies or service degradation, predicting failure(s) for proactive actions, and capacity optimization... all towards higher CSAT and customer delight while decreasing operating costs.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了