AIOPS Design Principles
Along with the business-level guiding principles, the responsible team should collaborate to develop a set of design principles for each functional area through which AIOPS will be integrated. The following are some examples of design principles:
GENERAL
FAULT/EVENT MANAGEMENT
Unsupervised event and log clustering should learn patterns across tools/domains- systems using different models for clustering will have a different meaning
PERFORMANCE MANAGEMENT
Unsupervised anomaly detection should learn patterns across tools/domains- systems using different models for clustering will have a different meaning
CONFIGURATION MANAGEMENT
Learning algorithms should be able to take advantage of special, topological relationships of objects in order to maximize performance and deliver root cause inferences.
INCIDENT MANAGEMENT
Align Major incidents to services and the right resources at the right time.
Enable consistent and reliable incident data for ongoing ML training.
Provide context from event monitors, anomalies & meta-data for accelerated MTTR.?
领英推荐
CHANGE MANAGEMENT
Accurately evaluate risk based on historical context.
PROBLEM MANAGEMENT
Reliably provide root cause details for accurate problem resolution.
Enable & enforce feedback loop from known errors & problem resolution to prevent future incidents.
OPERATIONAL KNOWLEDGE MANAGEMENT
Leverage ALL classified learning(automated and manual) for ML training.
RUNBOOK AUTOMATION
Automate preventative tasks for pre-Incidents.
Allow for scripting for resolution, not just service restoration.
AN EXAMPLE OF A LOGICAL ARCHITECTURE MIGHT LOOK LIKE THE VISUALIZATION BELOW:
Learn more about Grok’s design principles using AIOPS by starting a free trial here.?