What Is Your Model Hiding? A Tutorial on Evaluating ML Models
Emeli Dral
Co-founder and CTO Evidently AI | Machine Learning Instructor w/100K+ students
Imagine you trained a machine learning model. Maybe, a couple of candidates to choose from.
You ran them on the test set and got some quality estimates. Overall, they perform as well as they can, given the limited data at hand.
Now, it is time to decide if any of them is good enough for production use. How to evaluate and compare your models beyond the standard performance checks?
In this tutorial, we will walk through an example of how to assess your model in more detail.
Case in point: predicting employee attrition
We will be working with a fictional dataset from a Kaggle competition. The goal is to identify which employees are likely to leave the company soon.
Let's assume we ran our fair share of experiments. We tried out different models, tuned hyperparameters, made interval assessments in cross-validation.
We ended up with two technically sound models that look equally well.
Next, we checked their performance on the test set. Here is what we got:
- A Random Forest model with a ROC AUC score of 0.795
- A Gradient Boosting model a ROC AUC score of 0.803
Both our models seem fine. Much better than a random split, so we definitely have some signal in the data.
The ROC AUC scores are close. Given that it is just a single-point estimate, we can assume the performance is about the same.
Which of the two should we pick?
Same quality, different qualities
In the complete tutorial, we look at the models in more detail and visualize their performance using Evidently library.
For example, we discover how our first model makes only a few very confident predictions. The second gives us more opportunity to adjust the decision threshold and take advantage of the precision-recall trade-off.
Depending on the use case, one can work better than the other.
Read on for the full tutorial with code: https://evidentlyai.com/blog/tutorial-2-model-evaluation-hr-attrition