Important Metrics for classification problems in ML

Important Metrics for classification problems in ML

Learning from data is virtually universally useful. Master it and you will be welcomed anywhere.

When evaluating machine learning models there is a list of possible metrics to assess performance. There are things like Confusion matrix, Accuracy, precision-recall, ROC curve and so on. All of them can be useful, but they can also be misleading or do not answer the question at hand very well. I have mentioned some of the significant metrics which can be used in any classification problem

Confusion matrix

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one.

Terms associated with Confusion matrix:

True Positives (TP): True positives are the cases when the actual class of the data point was True and the predicted is also True

True Negatives (TN): True negatives are the cases when the actual class of the data point was False and the predicted is also False

False Positives (FP): False positives are the cases when the actual class of the data point was False and the predicted is True.

False Negatives (FN): False negatives are the cases when the actual class of the data point was True and the predicted is False. 

Receiver operating characteristic

A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The method was developed for operators of military radar receivers, which is why it is so named. The ROC curve is the plot between sensitivity and (1- specificity) i.e. False positive rate and True Positive rate. 

Gain and Lift charts

Gain and Lift chart are mainly concerned to check the rank ordering of the probabilities. Here are the steps to build a Lift/Gain chart:

Step 1: Calculate probability for each observation

Step 2: Rank these probabilities in decreasing order.

Step 3: Build deciles with each group having almost 10% of the observations.

Step 4: Calculate the response rate at each deciles for Good (Responders) ,Bad (Non-responders) and total.

Gain chart is the graph between Cumulative %Right and Cumulative %Population. 

Lift Charts

The basic idea of lift analysis is as follows:

To illustrate the idea, we will consider a simple model: we want to predict if a customer of an OTT app service will cancel its subscription or not. This is a binary classification problem: the user either cancels the subscription (churn=1) or keeps it (churn=0)

  1. Group data based on the predicted churn probability (value between 0.0 and 1.0). Typically, you look at deciles, so you would have 10 groups: 0.0 - 0.1, 0.1 - 0.2, ..., 0.9 - 1.0
  2. Calculate the true churn rate per group. That is, you count how many people in each group churned and divide this by the total number of customers per group.

Lift is simply the ratio of these values: target response divided by average response.

Lift Score = (target rate / average rate)

A targeting model is doing a good job if the response within the target is much better than the average for the population.

The purpose of our model is to estimate how likely it is that a customer will cancel its subscription. This means our predicted (churn) probability should be directly proportional to the true churn probability, i.e. a high predicted score should correlate with a high actual churn rate. 

Then you would target all users with a score between 0.8 and 1.0, because this is the range where the churn rates are higher than the average churn rate. You do not want to pour money down the drain for customers, who have a below-average churn probability.

There might be cases where this does not matter, e.g. when your main goal is to target everyone who churns, but it does not matter, if you also target some people who won't churn depending on business scenario & budget availability.

Lift / Gain charts are widely used in campaign targeting problems. This tells us till which decile can we target customers for a specific campaign. Also, it tells you how much response you expect from the new target base.

Accuracy:

Accuracy in classification problems is the number of correct predictions made by the model over all kinds of predictions made. Accuracy is a good measure when the target variable classes in the data are nearly balanced.

Precision:

Precision is a measure that tells us what proportion of customers that we identified as probable churn, actually churned. The predicted positives (Customers predicted as probable churn are True Positive and False Positive) and the customer actually churned are True Positive

Recall or Sensitivity:

Recall is a measure that tells us what proportion of customers that actually churned out was identified by the algorithm as probable churn. The actual positives (Probable churn are TP and FN) and the customer identified by the model as probable churn are TP.

When to use Precision and When to use Recall?

Precision is about being precise. So even if we managed to capture only one  case, and we captured it correctly, then we are 100% precise. Like in medical science or fraudulent transaction

Recall is not so much about capturing cases correctly but more about capturing all cases that can churn with the answer as probable churn. So if we simply always say every case as probable churn, we have 100% recall. So basically if we want to focus more on minimizing False Negatives, we would want our Recall to be as close to 100% as possible without precision being too bad and if we want to focus on minimizing False positives, then our focus should be to make Precision as close to 100% as possible.

Specificity:

Specificity is a measure that tells us what proportion of patients that did NOT have cancer, were predicted by the model as non-cancerous. The actual negatives (People actually NOT having cancer are FP and TN) and the people diagnosed by us not having cancer are TN. Specificity is the exact opposite of Recall. In statistical analysis of binary classification, the F1 score (also F-score or F-measure) is a measure of a test's accuracy.

F1 Score = Harmonic Mean(Precision, Recall) 

Happy machine learning to all data science champs.









MANOJ K SINHA

Founder - MANOJ SINHA AT HR EDGE | Ex HR- Tata Steel, Idea Cellular, Airtel, VIL, PGDPM, PGCHRM(XLRI_VIL),

4 年

True positive & false positive - when data is false how prediction could be true? same goes for True negative /False negative .. nice sharing ..

要查看或添加评论,请登录

社区洞察

其他会员也浏览了