AIOps - Explainability using pertinent positives

AIOps - Explainability using pertinent positives

Arun Ayachitula, Rohit Khandekar & Upendra Sharma

Classifier Explainability is a Broad AI practice to explain the classification decisions and establish ‘Trust in AI’. Indeed, the Explainability has become one of the important factors for evaluating machine learning models and gain the trust of IT user community for adoption at a scale.

What is Classifier Explainability?

Classifier Explainability refers to the ability to provide insights into how the classifier decision process works. Broadly, there are two types of classifier Explainability:

Global Explainability deals with insights into how a classifier is working at a corpus level. This includes, for example, determining

a.???if the classifier is biased w.r.t. some classes, ?

b.?if the classifier is being confused between any pairs of classes,

c.??what feature sets are most important for given classes.

Global Explainability uses the confusion matrix, coefficient matrix and shows the pertinent positives and pertinent negatives at a classifier model level.

Local Explainability deals with insights into how a classifier is working at an instance level. This includes, for example, determining

a.???how the classifier reached its decision for a given input,

b.???what features were most important for this decision,

c.????what missing features, had they been present, would have changed the decision of the classifier.

Good explanations are important for helping end-users develop trust in the classifier, for developing new insights about the underlying domain, or for improving the classifier itself. One explanation may not suit everyone, however. Indeed, different users may require different types of explanations based on their needs and levels of sophistication.

Explainability with less data

Here we focus on local Explainability for small IT texts. Small texts like ticket abstracts or descriptions, event summaries are prevalent in IT domain and provide valuable insights into the overall health of the IT infrastructure. Furthermore, small texts are easier to deal with than larger texts since they have a smaller context to understand. Below, we briefly describe how we compute explanations while classifying such small IT texts.

No alt text provided for this image

In order to explain why a decision was made by the classifier instead of another, we compute explanations. Our local explanations:

  • identify key input features responsible for the resulting classification – such features are called “pertinent positives”,
  • are sparse, i.e., contain only a small number of features, and
  • are easy for humans to interpret.

Computing sparse pertinent positive features

In an industry-scale text classification task, the dimensionality of the full feature space can easily be in millions. Fortunately, we do not have the curse of dimensionality while dealing with a small text, since we only must deal with a small number of features present in that text.

To simplify the exposition, let us assume that we are using a linear classifier, e.g., SVM or Passive-Aggressive classifier, with unigrams as features, without any TFIDF transformation or without any class-probability calibration. Thus, the classifier uses a k-by-n coefficient matrix C where k and n denote the number of classes and features respectively and given a binary input feature vector x, outputs class

No alt text provided for this image

The problem of computing a pertinent positive explanation can be formulated as computing a binary vector x’ such that:

1.???x’ is dominated by x, i.e., it only has a subset of features from x,

2.???x’ is classified into the same class j,

3.???x’ is sparse, i.e., it has as few features as possible, and

4.???x’ has as large “distance function gap”, i.e., it minimizes max_i (C_i x’ – C_j x’) where the maximum is taken over all i not equal to j.

Since the maximum of linear functions is a convex function, the above problem can be cast as a convex optimization problem. We impose L1-regularization to get sparsity and L2-regularization to reduce the solution magnitude even further. Such a problem can be solved by using standard techniques from convex optimization, including gradient-descent and shrinkage-thresholding algorithms.

Iterative Shrinkage/Thresholding Algorithms (ISTA) and their applications to computing Pertinent Positive features in classification

The Iterative Shrinkage/Thresholding Algorithms (ISTA) are used to compute sparse solutions to inverse linear problems. A typical example of an inverse linear problem is linear regression.

Consider a classification problem, e.g., the text classification problem. We use a Passive-Aggressive Algorithm for such a text classification problem in the IT ticket management domain. Consider a ticket T that gets classified into a class C (e.g., "disk-handler") using this algorithm. It is often important to show "evidence" of the inner working of the classifier and "explain" why the ticket T got classified into class C. The pertinent positive in a ticket like T are a small subset of features of T that are responsible to its classification into class C. Such a set of features provide a good explanation of the inner working of the classifier.

We formulate the problem of finding positive pertinent features for the text classification problem as a sparse inverse linear problem. We customize and simplify the ISTA algorithm to make it very efficient for this use case. This customization is non-trivial and cannot be easily derived from a general ISTA algorithm. It has to take into account the specific problem formulation that PAC uses internally and using its structural properties to implement the iterative thresholding step efficiently.

Specific unique contributions of this work:

1.????Formulation of the problem of computing pertinent positives for the IT ticket classification problem to identify token in the ticket description that are explanations of the classifier behavior

2.????Formulation of problem of computing pertinent positives for the linear (passive-aggressive) classifier as an L1 regularization problem

3.????Using ISTA algorithm to solve the above mentioned L1 regularization problem

  • Choice of algorithm parameters like step size and thresholds for shrinkage step

4.????Steps to convert the output of the ISTA algorithm to identify the pertinent positive features (tokens) by using a magnitude significance threshold

AIOps Visualization – a view

Explainability: The Local Explainability is computed by identifying the pertinent positive features from a given ticket using various AI/ML Natural Language Processing Techniques. The Pertinent Positives from a ticket are highlighted below in the ticket description.

Example Ticket Summary: N1VL-PA-APB169_Guest File System:/var|Partition Utilization_VirtualMachine ae81283e-dbac-4bc4-b780-bd37b07d3446/One or more virtual machine guest file systems are running out of?disk?space

Explainability Identified Pertinent Positive Features:

disk, space

No alt text provided for this image
Intgegrated AIOps UI/View
Amit Dhurandhar

Principal Research Scientist at IBM TJ Watson

2 年

Arun, glad that the idea had practical value for you. Nice implementation of the idea!

回复

要查看或添加评论,请登录

Naga (Arun) Ayachitula的更多文章

社区洞察

其他会员也浏览了