Under the Hood of vuRCABot 2.0: A look at the Machine Learning models that make VuNet’s RCA Assistant tick
Our AIOps series focuses on how VuNet’s flagship product, vuSmartMaps?, provides the platform for a rich set of ML models to be built upon it, and enables enterprises to graduate to complete automation.
In the second blog of the series, we examine how vuSmartMaps? provides the data architecture on which VuNet’s AI/ML infrastructure, vuCoreMLOps is built. vuCoreMLOps’ pre-built ML models for anomaly detection and domain intelligence for event correlation make RCA faster, explainable, and more accurate.
Within 2 months of buying a new car, Sara started facing issues. At times the car would not start, or the engine would stall or lag/surge in the middle of driving. Out of the blue, the dashboard would black out and she would have to pull over and restart the engine for it to come back on. After countless visits to umpteen mechanics, wherein every single aspect of the car, including the carburetor, fuel injector, electrical wiring and the indicator light for dashboard warnings was checked and rechecked, the root cause was finally attributed to a faulty ECU (Engine Control Unit), by a mechanic with more than 20 years’ experience. The ECU had to be replaced, and after that, Sara’s car was good as new!?
A modern car is a complex machine. An article by IEEE indicates that a premium-class automobile “contains close to 100 million lines of software code.” The software executes on 70 to 100 microprocessor-based ECUs networked throughout the body of the car. These computing units control engine functions, regulate braking behaviour and monitor the air conditioning system. To put this in perspective, a Boeing 787 Dreamliner only has 14 million lines of code! It was obvious that diagnosing the origin of the issue and narrowing it down to the ECU was something only someone well-versed in the inner workings of the car could do.?
The point of Sara’s story was to drive home (pun intended!) the fact that problems do invariably occur, and when they do, the only way they can be solved is IF the root cause of the issue is detected accurately and in a timely manner.
We see something similar in software systems every day.?Digital-first enterprises have complex and heterogeneous deployment environments, distributed across a hybrid software stack, with a single transaction traversing multiple touchpoints, internal as well as external APIs and microservices. In such a situation, when things go wrong, IT Operations Management (ITOM) and Site Reliability Engineering (SRE) teams are hard-pressed to answer two important questions?
Peeking Under the Hood of vuRCABot 2.0
In our previous blog, “We Have a Bot for That ”, we went over how vuRCABot 2.0, VuNet’s AI/ML-powered automatic RCA (Root Cause Analysis) assistant, reduces both MTTD and MTTR.
Some of the use cases that vuRCABot supports include:
We explore the engine which powers vuRCABot 2.0 and delivers these capabilities in the next section.
领英推荐
vuCoreMLOps – The Brain Behind vuRCABot 2.0
The ML team at VuNet , which houses PhDs and some of the best brains in statistical modeling and ML Engineering, has been working in deep collaboration with top-notch professors from premier educational institutions, as well as industry leaders with years of experience in AI implementations across various domains. Acting as our technology, strategic and ML advisors, they have guided us on the challenges to be overcome in terms of the data quality and integrity required to solve the problem at hand, the approaches needed and the nature of testing to be undertaken to fine-tune the algorithms to the extent that they work in production environments at scale.?
In our blog entitled “Sowing the Seeds of MLOps in the Soil of vuSmartMaps? ,” we spoke about VuNet’s vuCoreMLOps product, which is built on our Business Journey Observability platform vuSmartMaps’ information architecture. vuSmartMaps? provides the necessary infrastructure for AI algorithms and ML models to be built upon it, as can be seen in the following diagram:
How vuRCABot 2.0 Works
A lot of what we have said so far might seem like it is too good to be true – an AI-powered assistant which delivers readymade RCA? But the fact remains that vuRCABot is one of the few tools today that is capable of event correlation, root cause analysis, automated remediation, and closure of feedback loops. What works in our favour is the fact that our correlations are based on journey metrics, which means that we use our expertise in Business Journey Observability and our proprietary vuSmartMaps? platform as the foundation for the data that vuRCABot works with. Any AI/ML model is only as good as the data that is fed to it, and since our data is enriched and contextualized via a 5C process or “observability pipeline” detailed in an earlier blog , what we get is data tailormade for insights generated on journey metrics, and correlations made in the context of a user journey, instead of disjointed logs and metric data sets from disparate services and monitoring tools.?
The essential inputs needed for effective RCA are a business journey observability view and historical data from failure events, known information on correlations between parts of the system, and a journey’s user experience lead indicators and golden signals, which give us essential information about the health of various components in the system. vuRCABot has an inbuilt MLOps layer, with the ability to store large volumes of metrics, logs, and traces data in the context of a journey at scale, to run pre-packaged and custom ML models.
The following flow diagram shows the methodology behind the magic – and why vuRCABot can do what it does, and do it well!?
vuRCABot runs in near real time as a micro-batch job and handles thousands of signals to arrive at an RCA. In addition, it can handle a plethora of operational metrics, each with high cardinality. It also ingests data from vuSmartMaps?, existing monitoring tools, data stores and data lakes.??
Initial tests in production environments, where vuRCABot was deployed alongside conventional monitoring tools, have thrown up some extremely encouraging results, with vuRCABot reducing MTTD and MTTR by as much as a whopping 75%! This is empirical evidence of the fact that the chosen approaches work.?
Conclusion
With the proliferation of data sources to be monitored, the complexity and heterogeneity of deployment environments, the number of internal and external touchpoints a business journey traverses, and the need for enterprises to be up and running 24x7, conventional manual methods of performing root cause analysis must be replaced by AIOps. With its emphasis on tracking anomalies on business journey indicators to optimize user experience, and its unique approach to correlating signals to arrive at root cause recommendations, vuRCABot 2.0 is an enterprise’s best bet to perform this transition seamlessly. To know more about how vuRCABot 2.0 can transform RCA in your organization, contact [email protected] .