登录查看更多内容

Understanding AIOps and its linkage to High Availability and Observability

Deepa Naik

Technologist. Educator. Explorer

发布日期: 2022年9月15日

Technology never sleeps.

Businesses need to have high availability, especially for mission critical applications. When we do target up times – the 5 9s (99.999 %) is an expensive affair and defining the number of 9s in your SLA is directly proportional to the cost that you are willing to invest.

Technology leadership aims to define the SLAs for the different aspects of availability - goals to target for redundancy, failover, rollback and scaling and the degree of automation for each of these. There is also the angle of composite availability – as systems do not work in silos but depend on other upstream systems or integrate through external interfaces in the cloud native era of distributed systems.

A very closely linked aspect to resilient systems with high uptime is “observability” – the ability to learn what is happening in your system and avoid extended outages.

The three pillars of Observability being: metrics, logs, and traces.??

The scope of Observability, to a large extent, is about helping you identify the problem as soon as it happens. And sometimes even before, an incident happens – and this is where AIOps fits in as it tries to provide proactive alerts and responses based on the event and telemetry data captured in the IT environment.

AIOps can be seen as a part of observability and assuming you have total observability data viz.? M.E.L.T (metrics, events, logs and traces) you can leverage AIOps and the power of AI/ML to correlate events and identify problems, cause of incidents and suggest what can be done to fix it.

?AI Ops, its benefits and what it involves

The term “AIOps” stands for “Artificial Intelligence for IT operations.” Originally coined by Gartner, it refers to the way data and information from an application environment are managed by an IT team -- in this case, using AI.

The definition of AIOps by Gartner says “AIOps platforms utilise big data, modern machine learning and other advanced analytics technologies to directly and indirectly enhance IT operations (monitoring, automation and service desk) functions with proactive, personal and dynamic insight.”

AIOps is a system that relies heavily on data and learning to provide proactive prediction and alerts and decisions which will hopefully increase in accuracy over time.

The benefits of AIOps can boast on are

improved root cause analysis
intelligent incident management
anomaly and threat detection
and finally resolve issues automatically

?This is of enormous value to IT Operation teams and in turn helping achieve high availability and adhering to uptime metrics.

The typical AIOps Use cases include – decrease MTTR (mean time to repair) and associated cost, proactive performance monitoring, drive faster and better decision making for the team and its broad categories when we can see an impact are incident and problem management, IT Operations Analytics, Infrastructure Management, Capacity Management.

领英推荐

AI-Orchestrated IT Operations: The Rise of AIOps

Buxton Consulting 1 周前

AIOps: Moving Beyond Dashboards to a Future of…

Nous Infosystems 2 周前

Conquering next-gen challenges with continuous test…

思博伦通信 7 个月前

The Road to AIOps

At the heart of AIOps is machine learning and telemetry data collected. Data ingestion is critical - clean and usable telemetry data which you can depend on is a prerequisite. The other important thing that can be done at the start of the AIOps journey is to understand what is necessary for your enterprise to collect and measure and focus on those aspects to build out your plan. Explain ability, as one of the key aspects for this initiative as AIOps augments the decision-making capability of the IT Ops team and therefore needs to be able to earn the trust of the team using it. Finally, the IT environment and an intelligent infrastructure is crucial in order to capture the right telemetry and event data.

Five Main Functions as described by Gartner for AIOps are

Data Ingestion – this function will ingest, index and normalise events from across devices and vendors to grab data and telemetry like syslog, config changes, SNMP, NetFlow and others
Topology – has a list and relationship between the various devices to understand the context between the end user and the resource they are trying to access
Correlation – the next function is to correlate the telemetry data between devices
Recognition – this is where issues are detected or predicted based on the machine learning model. This is the stage to identify the use cases and decide your focus and what you want to achieve with the AIOps
Remediation – this is the function where a recommendation is made based on the situation or automates a response to the external system

There is a possibility of a lot of false positives in the beginning as it is a system that relies heavily on learning and improving over a period of time with the supervised learning model.

AIOps - Artificial Intelligence for IT Operations

AIOps Platform – the right time to get on one

?The AIOps platform market is relatively new and most vendors are in the process of introducing more use cases to their machine learning models.? The features provided by most platforms involve

Identifying meaningful patterns that provide insights ? ( Pattern Discovery)
Providing Automated Insights - using correlation, asset relationships and dependencies between events associated with an incident (Root Cause Analysis and Predictive Analysis)
Probable Adaptive Remediation - As the technology matures, users will be able to leverage prescriptive advice from the platform, enabling the action stage

The road map to approach for AIOps Platform could be with incremental goals for observability – setting up a metrics program and practical outcomes and then move to an all-inclusive AIOps platform.

Many companies are also trying a do it yourself (DIY)-architectural approach towards AIOps using the strategies and tools like data lakes, transport layer, data pipeline (e.g., using Kafka), analytics and visualisation.

?Conclusion

To summarise, meeting high availability requirements of mission critical applications has a huge dependence on observability. And given the amount of data captured by modern IT environments and infrastructure it is of great value to the IT Ops team to automate and plan the adoption of AIOps in an incremental manner.

The best way to start on the AIOps journey is to identify the use case to start off - ideally the small and focus areas where you want to ensure high availability and failover. This can be built upon over a period and at the right time an AIOps platform can be introduced to justify the ROI of improved MTTR (mean time to repair) and improved customer experience.

“The journey of a thousand miles begins with a single step” and it's never too late to take the first step towards AIOps.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Tech Viewpoints & Insights

1,443 位关注者

Nikhil Shirali

Communications professional at Intel India

2 年

Great insights on AIOPs Deepa!

1 次回应

要查看或添加评论，请登录

Deepa Naik的更多文章

Strategy to Bridging AI and BI in your Organization: Data Lakehouse

2025年2月7日

Strategy to Bridging AI and BI in your Organization: Data Lakehouse

Two Worlds, One Data Strategy. Most enterprises today have two primary data use cases—one focused on Business…

1 条评论
The Triad for Organization Data Capabilities

2023年12月29日

The Triad for Organization Data Capabilities

We often hear of full stack developers for application development, however it's rare to find such a job role in the…
Developing a Strategy for Handling Technical Debt

2023年10月19日

Developing a Strategy for Handling Technical Debt

With time to market becoming the key focus, we are all forced to develop “point-solutions”. We are building new…

1 条评论
AI-as-a-Service – Changing the Landscape for Enterprise AI Adoption

2023年2月7日

AI-as-a-Service – Changing the Landscape for Enterprise AI Adoption

Cloud Computing is now main stream. Cloud service models such as Iaas (Infrastructure as a Service), PaaS (Platform as…

3 条评论
Developing an RPA Strategy and RoadMap for the Enterprise

2022年7月2日

Developing an RPA Strategy and RoadMap for the Enterprise

Robotic Process Automation (RPA) is a popular initiative for the enterprise leadership to kick start their journey from…
Training Data Services in AI: Why Outsourcing Could be a Good Choice

2022年5月13日

Training Data Services in AI: Why Outsourcing Could be a Good Choice

~ Deepa Naik with Hardik Dave, Founder and CEO IndikaAI Significance of Training Data in AI A core aspect of modern AI…

1 条评论
Enterprise DataWarehouse - The Business Imperative

2022年4月28日

Enterprise DataWarehouse - The Business Imperative

~ Deepa Naik with Gourav Kondadadi Analyst, West Pharmaceutical Services Many companies look at Datawarehouse (DW) as…

2 条评论
Enterprise AI Trilogy: Data Strategy for Real-Time Analytics

2022年4月12日

Enterprise AI Trilogy: Data Strategy for Real-Time Analytics

In this article we try to explore two main questions - a. Why does your business need real-time analytics? b.

1 条评论
Enterprise AI: Its Linkage to Enterprise Architecture

2022年3月29日

Enterprise AI: Its Linkage to Enterprise Architecture

Author: Deepa Naik Even though Artificial Intelligence(AI) has been much talked about in recent years - backed with…

4 条评论
AI for Social Good: As India Sets the Stage

2019年3月13日

AI for Social Good: As India Sets the Stage

Not just another AI Event. This was very different.

4 条评论

See all articles

Understanding AIOps and its linkage to High Availability and Observability

Deepa Naik

Technologist. Educator. Explorer

领英推荐

Tech Viewpoints & Insights

1,443 位关注者

Deepa Naik的更多文章

社区洞察

其他会员也浏览了

20 Most Popular Articles of the Week of February 3rd, 2025 + Upcoming Webinars

GenAI-Powered Observability: What SREs Need to Know

Negative Time to Resolution; Preventing Outages Before They Happen

From Complexity to Clarity: A CIO's Perspective on AIOps Implementation

The Rise of AIOps

Unpacking the Network #5 | Top IT Predictions

Transforming Telecom with AIOps: Use Cases and Applications!

Life in the Fast Lane - AI-driven Operations

Why AIOps is a Game-Changer for IT Operations in 2025

The Power Of AIOps In Driving Digital Transformation

领英推荐

Tech Viewpoints & Insights

1,443 位关注者

Deepa Naik的更多文章

Strategy to Bridging AI and BI in your Organization: Data Lakehouse

The Triad for Organization Data Capabilities

Developing a Strategy for Handling Technical Debt

AI-as-a-Service – Changing the Landscape for Enterprise AI Adoption

Developing an RPA Strategy and RoadMap for the Enterprise

Training Data Services in AI: Why Outsourcing Could be a Good Choice

Enterprise DataWarehouse - The Business Imperative

Enterprise AI Trilogy: Data Strategy for Real-Time Analytics

Enterprise AI: Its Linkage to Enterprise Architecture

AI for Social Good: As India Sets the Stage

社区洞察

其他会员也浏览了

20 Most Popular Articles of the Week of February 3rd, 2025 + Upcoming Webinars

GenAI-Powered Observability: What SREs Need to Know

Negative Time to Resolution; Preventing Outages Before They Happen

From Complexity to Clarity: A CIO's Perspective on AIOps Implementation

The Rise of AIOps

Unpacking the Network #5 | Top IT Predictions

Transforming Telecom with AIOps: Use Cases and Applications!

Life in the Fast Lane - AI-driven Operations

Why AIOps is a Game-Changer for IT Operations in 2025

The Power Of AIOps In Driving Digital Transformation