Whitebox-ifying ML models #I : Permutation Importance
Niren Sirohi, MBA, PhD
Chief Operating Officer, MassDOT RMV | Public Service, Non-Profit, and Analytics Leadership | Data Science, AI, Digital, Technology, Innovator | Passionate about the environment, climate change, bird conservation
Machine Learning is all the rage today and universities are churning out data science graduates who can use black box models leveraging packages like scikit-learn and others. However most of these models are not very useful if they cant be easily interpreted for several reasons
- Humans inherently don’t trust black box models, our inherent curiosity wants us to understand why something works the way it works
- In some industries e.g. insurance, one is required to explain to regulatory bodies why something works the way it does or convince that certain factors are not in play (e.g. gender is not affecting the models or models are not race or age biased etc)
- Being able to interpret models increases their value because it enables conversations about the model and its value among a larger non technical audience
- Model interpretability makes the data scientist more comfortable that what they are building actually makes sense
In the following series of articles, i will share three simple ways to whitebox-ify your black box models. Let us start by understanding the types of questions people typically ask when they are trying to interpret models first
A) Which features/variables have the biggest impact or are most important for prediction?
B) How does the feature impact predictions? E.g. what is the impact on prediction of various values of feature A holding everything else constant?
C) How does the model work for an individual prediction? E.g if i have a model that predicts whether i should make a loan to an individual or not, what factors are driving my prediction for this individual and by how much?
A) can be answered by calculating “Permutation Importance”. The logic behind this approach is relatively straightforward: Let us say feature A is more important than feature B. In order to apply this approach, we will use our final model and the validation dataset. We know the performance (e.g. accuracy) of our model on the validation dataset which we will use as the benchmark to compare against. Let us conduct the following two exercises in your validation dataset:
- Randomly shuffle the values for feature A. Apply model and calculate the performance metric. Calculate the deterioration relative to benchmark
- Randomly shuffle the values for feature B. Apply model and calculate the performance metric. Calculate the deterioration relative to benchmark
If feature A is more important than feature B, then the deterioration in performance for A will be > for B. The amount of deterioration can be used to calculate the relative importance of each variable. A handy library to do this is eli. Give it a try
The next article will talk about how to address B). Enjoy!!