Whitebox-ifying ML models #2: Partial Dependence Plots
Niren Sirohi, MBA, PhD
Chief Operating Officer, MassDOT RMV | Public Service, Non-Profit, and Analytics Leadership | Data Science, AI, Digital, Technology, Innovator | Passionate about the environment, climate change, bird conservation
In the last post on this topic, we had identified three questions people typically ask when trying to interpret Machine Learning models. In this article I will focus on answering the second question
A) Which features/variables have the biggest impact or are most important for prediction? (answered in a previous article)
B) How does the feature impact predictions? E.g. what is the impact on prediction of various values of feature A holding everything else constant?
C) How does the model work for an individual prediction? E.g if i have a model that predicts whether i should make a loan to an individual or not, what factors are driving my prediction for this individual and by how much?
As one thinks about B), I am sure many of you are thinking of coefficients in models can be used to answer this. That is correct, but coefficient interpretation can be pretty gnarly even in the simplest of models. I will highlight another approach that provides clarity, namely “Partial Dependence Plots (PDP)”. Let us say you have a model with three features, A, B, and C. You want to understand what the impact each of the levels of A (thre are 10 of them from L1-L10) is on the prediction. As before, in order to apply this approach, we will use our final model and the validation dataset. Let us understand how this approach works by taking a look at one observation in the validation dataset which has L1 of feature A and some other levels for features B and C. The following steps can then be used to build our partial dependence plot
- For the single observation, vary the levels of feature A leaving the other features at their current levels and use the model to make a prediction. Plot the levels of feature A on the x-axis and the predictions on the y-axis. This is the PDP for that one observation
- How do we define the PDP across all observations. Repeat the same exercise as above for each observation and show the average prediction for the various levels on the y-axis.
Earlier you determined the relative importance of A vs B vs C. The PDP in addition gives you information on how the prediction varies for each level of A and so on. A handy library to do this is PDP Box Library. Give it a try
As a note, one can also draw 2D PDP’s which will give us insight on interactions between variables (at least between 2 variables)
The next article will talk about how to address C). Enjoy!!