Here are some of the most common performance metrics for machine learning:
- Confusion matrix: A confusion matrix is a table that summarizes the performance of a machine learning model. It shows the number of true positives, false positives, true negatives, and false negatives.True positive (TP): The model correctly predicts that the input is positive.False positive (FP): The model incorrectly predicts that the input is positive.True negative (TN): The model correctly predicts that the input is negative.False negative (FN): The model incorrectly predicts that the input is negative.
- Accuracy: Accuracy is the fraction of predictions that the model gets correct. It is calculated by dividing the number of correct predictions by the total number of predictions.
- Recall: Recall is the fraction of positive instances that the model correctly predicts. It is calculated by dividing the number of true positives by the sum of the true positives and false negatives.
- Precision: Precision is the fraction of predicted positive instances that are actually positive. It is calculated by dividing the number of true positives by the sum of the true positives and false positives.
- F1 score: The F1 score is a measure of both precision and recall. It is calculated by taking the harmonic mean of precision and recall.
- ROC/AUC curve: The ROC curve is a graphical plot of the true positive rate (TPR) against the false positive rate (FPR). The TPR is the ratio of true positives to the sum of true positives and false negatives. The FPR is the ratio of false positives to the sum of false positives and true negatives.
The specific performance metric that is most appropriate for a particular machine learning model will depend on the specific problem that the model is being used to solve. For example, if the model is being used to classify images, then the accuracy, precision, and recall metrics would all be relevant. However, if the model is being used to predict the probability of a customer churning, then the ROC/AUC curve would be the most relevant metric.
Here are some examples of how these metrics can be used:
- A confusion matrix can be used to identify the types of errors that a machine learning model is making. For example, if the model is misclassifying a large number of positive instances as negative, then this may indicate that the model needs to be tuned to be more sensitive to positive instances.
- The accuracy metric can be used to get a general sense of how well a machine-learning model is performing. However, it is important to note that accuracy can be misleading if the classes are imbalanced. For example, if a model is being used to classify images of cats and dogs, and there are 10 times as many images of cats as dogs, then the model could achieve an accuracy of 90% simply by predicting that all images are cats.
- The precision and recall metrics can be used to get a more detailed understanding of how well a machine-learning model is performing for each class. For example, if a model is being used to predict whether a customer will churn, then the precision metric would be more important than the recall metric if the company wants to minimize the number of false positives.
- The F1 score is a weighted average of the precision and recall metrics. It is often used as a single metric to summarize the performance of a machine-learning model.
- The ROC/AUC curve can be used to compare the performance of different machine learning models. It can also be used to select the threshold for a model that minimizes the cost of false positives and false negatives.