AIOps – Driving Digital Transformation in IT Operations
Source:https://blogs.gartner.com/andrew-lerner/2017/08/09/aiops-platforms/

AIOps – Driving Digital Transformation in IT Operations

In recent years, artificial intelligence(AI) for IT operations termed as AIOps by Gartner in 2017 is in focus for companies as 30% of large corporations are projected to exclusively use AIOps tools to monitor applications and infrastructure by 2023, up from just 5% in 2018.

AIOps helps in enhancing IT operations by applying AI and machine learning(ML) on big data collected from various IT operations tools and devices. As per MarketWatch, Global AIOps Market will reach USD 11.1 billion by 2025. In 2016 the market valued around USD 0.8 billion and it is anticipated to grow with a healthy growth rate of more than 34% over the forecast period 2017-2025.

In part, growth is due to the digital transformation that includes organizations adopting cloud, IoT devices, SaaS integrations, mobile applications, and increasing end-to-end business application assurance and uptime. Traditional incident management approaches to manage complexity in this dynamic environment don’t work and AIOps is needed to accelerate successful digital transformation in IT operations.

As the organizations’ journey toward AIOps and establish their advance analytics teams staffed with people skilled in ML frameworks and techniques, most of these organizations struggle to make their AI projects truly impactful to get the projects into production and integrated with existing applications and processes. In recent years I have been focusing on IT Ops transformation using AI and based on that learning I see there are five steps for successful adoption and scaling of AIOps.

Five steps to success

Framework for executing AIOps use cases include the following five steps.

1. Problem Definition and Scoping

IT leaders are actively looking for opportunities to apply AI to IT operations and the First step is to scope/prioritize AI use case and it is critical to get it right.

This phase starts with a problem statement and discusses the use case in detail and the feasibility around productionizing the same. It is important that the team is aligned on business KPIs in this phase. As the success of a project is not measured by the performance of the model but its impact on the business which is getting captured through business KPIs. Data Science team then tie up the performance of the model through that KPI.

In this phase, the team translates the business problem into a data science problem, define target variable, evaluation criteria, success criteria, and any constraints around the execution of the use case. Ensuring that team has agreed on KPIs at the start ensures the right prioritization of the use case and helps in correlating that with model performance.

2. Data Assessment

Without the right data, no AI project can be successful and data is in the heart of AIOps use cases. Understanding the data sources and the pipeline of collecting is important to know, that if required data is getting collected for executing the use case. AIOps needs a robust data collection pipeline from various sources such as agents, devices, network components, and applications to know the health of the IT environment in real-time. Considering all these complexities, the role of Data steward becomes very important here as he is one who provides an interpretation of the data in different data sources (metadata, lineage) and also ensures adherence to data governance standards.

3. Model Development

This is the phase where Data scientists spend most of their time exploring/preparing the data to understand patterns, do feature engineering and train models. This phase helps in transforming the raw data into insights through iterative model creation, visualization of results, and identification of the actions needed to deliver improvements in the organization. The objective of this phase is to demonstrate and quantify value using the model and select the model to deploy. The model developed during this phase is a proof of concept, and will require additional work to deploy into production environments which will happen in the next phase.

4. Pilot – Deploy and Run

Once the model is developed and based on the evaluation criteria team will have the best model to deploy, team needs to run the model in the pilot environment to evaluate the performance in real-world and to decide if the model is ready to be deployed in full production and will make the required impact. In this phase, Data Scientist works with the business owner and IT owner to create the pilot environment to run the model as it will run in full production. Pilot environment should be similar to full production and gives a framework to monitor and deploy the model. Team will also work to implement feedback loop to collect the feedback from users and data and refine the process/model accordingly. This phase delivers the changes required to capture the performance gains and value identified by the model and pilot environment establish new business processes required for consumption, adoption, and support of the use case.

Based on the performance of the model in Pilot, team will decide on production go or no go decision. If the team decides to go in production then the project will move to the next phase.

5. Production Deployment and Management

In this phase, team will deploy a productionized version of the solution that is as per organization's IT ecosystem and operating practices. As the model gets deployed in production and no matter how well the model is performing at the time of deployment, the performance of the model will degrade over time as new patterns will emerge due to interventions because of model and other changes in the environment. Due to these reasons team will put a feedback loop that tracks model performance and based on decided threshold retraining of model will happen.

As more and more use cases get deployed in AIOps, business users get concerned around the explainability of the model as they would like to know how the model is deciding different action and what changing what features and make what impact on the decision, that’s why it is important to keep the business owner in the journey from scoping to deployment and have combined decisions on KPI tracking, threshold values etc.

AIOps has the potential to improve numerous aspects of IT operations, from increasing productivity to better customer experience with business application assurance and uptime. The necessary technology and framework are now readily available: AIOps in many organization is now a key requirement for digital transformation. AIOps is helping in transforming the raw data into proactive decision making which is driving business value but it requires companies to think beyond the technology and It calls for new capabilities that include team running AIOps use case, framework to put things in production and sustained management focus to drive impact from AIOps.

Gaurav Sood

Data Engineer @ SCB / Content Writer with DAMA Norway

4 年

Sanchit Tiwari , good summary. But I think this is in parallel with how RPA is helping in monitoring operations and keeping an eye on infrastructure issues. RPA, AIOps and maybe cloud infra are all part of the same automation led revolution.

要查看或添加评论,请登录

Sanchit Tiwari的更多文章

  • Understanding the vanishing gradient problem(VGP) and solutions

    Understanding the vanishing gradient problem(VGP) and solutions

    In this article, I am trying to put together an understanding of the vanishing gradient problem(VGP) in a simplistic…

    1 条评论
  • Deep Learning - Different Frameworks

    Deep Learning - Different Frameworks

    Many research areas are getting impacted and transformed with the increase of new computing resources/ techniques and…

  • Feedback loop in Machine Learning – Labeling data

    Feedback loop in Machine Learning – Labeling data

    In real life application supervised machine learning depends on labeled datasets and quality of data labels have huge…

    4 条评论
  • Inferential statistics in nutshell – With Python

    Inferential statistics in nutshell – With Python

    As a research scholar, I need to use inferential statistics in my research work to make inferences about the population…

  • Data Leakage in Machine Learning – avoiding the trap

    Data Leakage in Machine Learning – avoiding the trap

    Data leakage is one of the most frequent mistake happens during our machine learning model building and it can happen…

    4 条评论
  • Math for ML - Using LaTeX & Python

    Math for ML - Using LaTeX & Python

    Before you start learning or implementing any machine learning algorithms, Mathematics is basic requirement…

  • Forecasting time series: choosing the algorithm to model

    Forecasting time series: choosing the algorithm to model

    We all know that predicting time series data is difficult and complex task due to uncertainty related with time and…

  • Fleet Management with Machine Learning

    Fleet Management with Machine Learning

    The word Fleet in simple terms can be understood as “a group of vehicles”. Fleet management is a system designed for…

    1 条评论
  • Deep Learning - Time to Deep Dive

    Deep Learning - Time to Deep Dive

    Last week attended the deep learning summit in Singapore with an objective to learn more about the application of deep…

  • Know the Value of your Customer

    Know the Value of your Customer

    In today’s world we are using Data Science to solve different problem for different types of business and helping them…

    1 条评论

社区洞察

其他会员也浏览了