ML binary classification Models Evaluation metrics
Evaluating an ML model is not easy task. There are lot of data with distribution need to be prepared with proper annotation for testing. Human error is not ignorable when we annotate these data. Whenever we test our model (during or after training), we can use these data. In running system, we may observe performance of model for some time where we collect and annotate inputs and check inference to see how the model is performing. When we say something as good or bad, we talk with numbers and these numbers have different meanings in different types of models. For example, in regression, we like to see what is the error the model gives for a given input. In binary classification, we may need multiple inputs to find the ration of good and bad results. In this article I will share some metrics commonly used for binary classification model using confusion matrix.
Confusion matrix
Confusion matrix is a table layout to visualize the performance of a model. Each row of the matrix represents instance in actual class and each column represents instance in predicted class or vice versa. In binary classification, we like to get the answer as True and False. In this matrix, four values utilized to understand the performance.
Following table is a possible structure of a confusion matrix.
An example can be as follows
So, if models result match with case 1 and case 3 then we are in perfect world. But in reality, model will give result as case 2 and case 4 (due to probabilistic nature of ML).
领英推荐
Metrices
Here we are. What is the accuracy of the model? Simple, the fraction of accurate answer over total population(?). Hmm, we have total population is sum of instances of all cases. And accurate? Yes, you are right. Accurate as in case 1 and case 3. So, here we have the metric Accuracy which is (TP+TN)/(TP+FP+TN+FN). In our case it is (6+3)/12 = 0.75. Ok, now we have a question. We want to know how many True instances are precisely identified. That means, model said positive but how many of them are actually positive i.e. how accurate the positive predictions are. This is another metric which is called Precision calculated as TP/(TP + FP). Higher the value means we have less false positive. Now, we look for another question. Among total positive instances, how much positives recalled by the model. And we have another metrics called Recall calculated as TP/(TP + FN). High the value means we have less false negative. When data are imbalanced, or the cost of false positive and false negative is different then we need to understand the performance of the model with both Precision and Recall. Hence, we need some balance among these two values. So, we can average of these. The mathematical formula to get average of two ration value is harmonic mean which is being calculated for Precision and Recall as 2/(1/Precision+1/Recall) which is known as F-1 Score. A high value means model has high precision and high recall.
So we have following metrices using confusion matrix
Accuracy = (TP+TN)/(TP+FP+TN+FN)
Precision = TP/(TP + FP)
Precision = TP/(TP + FP)
F-1 Score = 2/(1/Precision+1/Recall) = 2 x Precision x Recall / (Precision + Recall)
Each of these metrics has some meaning and give idea about the performance of the model. But these result are dependent on data. Accuracy are are simple to calculate and intuitive for binary classification but will mislead for imbalanced data. For imbalanced data, we may choose any of Precision, Recall or F1 Score depending on our target (minimizing false positive or false negative).