登录查看更多内容

SHAP is not all you need (or why you should always use permutation feature importance)

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization | Causal Inference

发布日期: 2024年3月10日

+ 关注

Repost from Christoph Molnar

A most annoying misconception in the world of machine learning interpretability

This post is

30% rant
50% comparison of SHAP and permutation feature importance
20% good news (announcement of the release date of conformal prediction book )

I just got a paper rejection.

The paper itself fills a theoretical and conceptual gap: While ML interpretation techniques such as partial dependence plots and permutation feature importance primarily describe the model, many (data) scientists use them to study the underlying data and phenomenon. Our paper discusses what’s needed to actually achieve the jump from model to data.

But that’s not what’s important today. Maybe I’ll explain the paper in another post.

Today I want to talk about a part of the criticism we received for the paper. Here are two quotes:

“SHAP graphs also contain all the information that PDPs contain”
“PFI is less informative than SHAP”

The reviewer draws the conclusion that ‘I do not see much value in the analysis of somewhat "inferior" feature-analysis methods [like PDP and PFI]’.

If you take this statement to its full conclusion everyone would have to stop working on PDP and PFI. And while we are at it, why not drop ALE plots, ICE plots, and counterfactual explanations and write yet-another SHAP extension paper?

The critic of the reviewer is wrong on at least two levels:

With this attitude, academia would be condemned to always study the hyped and shiny. It discourages thoroughness and diminishes the chance that “bets” on other lines of research are tested out.
In the case of SHAP, the reviewer is plain wrong. PDP and PFI are not a subset of SHAP. They are different techniques with different goals. And while, for example, PFI and SHAP can both produce importance plots, they are not the same.

If it were the first time that someone said that SHAP is all you need, it wouldn't be worthy of a post. But especially in “peer” review, the critique “You should be working on Shapley values / SHAP / LIME” was surprisingly common. And also elsewhere I often saw people with the attitude of “SHAP is all you need”.

It’s wrong and I’ll show why.

Short primer on SHAP and PFI

If you are already familiar with SHAP and PFI, just skip this section.

Let’s start with permutation feature importance, because this is one of the simplest interpretability methods to explain. It’s a model interpretation technique that assigns an importance value for each feature. The importance is computed as how much the model performance would drop if we shuffle a feature. The more the performance drops (aka loss increases), the more important the feature was for correct predictions.

Compute loss. Permute feature. Compute loss again. Compute difference. Simple.

SHAP is a method to compute Shapley values for machine learning predictions. It’s a so-called attribution method that fairly attributes the predicted value among the features. The computation is more complicated than for PFI and also the interpretation is somewhere between difficult and unclear.

领英推荐

Ordinary Least Squares

Marcin Majka 5 个月前

Reasoning on Graphs – Part II – Comparison and mapping…

Fabio Ricci 2 年前

A view of 4SIGHT's 2022 integrated report

Nduvho Kutama (MPhil Corporate Strategy, ACMA, CGMA) 1 年前

SHAP produces many types of interpretation outputs: SHAP can be used to explain individual predictions (aka attributions). But if you compute Shapley values for all the instances in your data, you can also aggregate them. Then you get good-looking plots that show you some notion of feature dependence, some notion of feature importance, and some notion of feature interactions. All these notions are of course tied to the not-so-easy interpretation of Shapley values. For an overview of the plots, you can check out my SHAP Plots For Tabular Data Cheat Sheet.

SHAP Is Not All You Need

Believing that SHAP is all you need is a typical pitfall: assuming that 1 method is the best for all interpretation contexts.

Let’s walk through my favorite example for showing how SHAP importance can be inadequate.

An xgboost regression model was trained on simulated data. But all of the 20 features were simulated to have no relation with the target. In other words, any type of relationship that the model picks up is the result of overfitting. And for this experiment, we overfit the model on purpose because in this case PFI and SHAP will diverge quite drastically.

The example is from our paper on ML interpretability pitfalls:

Clearly, SHAP and PFI deviate in the bar plot above. PFI more or less shows that all 20 features are unimportant. But SHAP importance clearly shows that some of the features are important.

Which interpretation is the correct one?

Given the simulation setup where none of the features has a relation to the target, one could say that PFI results are correct and SHAP is wrong. But this answer is too simplistic. The choice of interpretation method really depends on what you use the importance values for. What is the question that you want to answer?

Because Shapley values are “correct” in the sense that they do what they are supposed to do: Attribute the prediction to the features. And in this case, changing the “important” features truly changes the model prediction. So if your goal tends towards understanding how the model “behaves”, SHAP might be the right choice.

But if you want to find out how relevant a feature was for the CORRECT prediction, SHAP is not a good option. Here PFI is the better choice since it links importance to model performance.

In a way, it boils down to the question of audit versus insight: SHAP importance is more about auditing how the model behaves. As in the simulated example, it’s useful to see how model predictions are affected by features X4, X6, and so on. For that SHAP importance is meaningful. But if your goal was to study the underlying data, then it’s completely misleading. Here PFI gives you a better idea of what’s really going on. Also, both importance plots work on different scales: SHAP may be interpreted on the scale of the prediction because SHAP importance is the average absolute change in prediction that was attributed to a feature. PFI is the average increase in loss when the feature information is destroyed (aka feature is permuted). Therefore PFI importance is on the scale of the loss.

A fallacy of the reviewer was to equate these different ideas of feature importance.

Unfortunately, this points towards a much larger issue in research on interpretability. The field is more method-driven than question-driven. We first develop methods, and then ask “what question do the methods really answer?”.

For SHAP, it’s not so easy to answer how the Shapley values are supposed to be interpreted.

Shapley values are also expensive to compute, especially if your model is not tree-based.

So there are many reasons not to use SHAP, but an “inferior” (as the reviewer said) interpretation method.

For another critique of Shapley values I recommend this post by Giles Hooker.

Prof. Dr. Holger von Jouanne-Diedrich

Follow me, I am a professor ?? (private account/all views my own)

1 年

I still like LIME a lot, see my post: https://blog.ephorie.de/explainable-ai-xai-explained-or-how-to-whiten-any-black-box-with-lime

2 次回应

David Young

CEO @ Goal Aligned Media | Digital Marketing, Advanced Analytics

1 年

Alastair, I liked your short primer on Shapley values and PFI. I've read about them, but have not used either one in practice. I had prior colleagues who liked Shapely values in the context of Marketing Mix Models, but at that time I chose not to use them in MMM. Both Shapely values and PFI, as you've described them, seem like good techniques to describe the importance of variables, as defined by changes in the prediction. On the other hand, a good MMM estimate of media impacts should consider many statistical concepts, not just the data fit. In your example you bring up "over fitting" and show how it either makes the Shapley values wrong, or at least, limits how they should be interpreted. Over fitting is a broad term that could represent several different underlying problems, and to add other reasons why a good fitting model could still be representing untrue relationships, we could throw in endogeneity, serial correlation, or differences in predictors' granularity, to name a few. IMO, Shapley values offer some insights, but rely on the fidelity of the prediction to assign the importance values. Therefore any underlying problems the model had will impact the Shapley values also.

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization | Causal Inference

1 年

https://mindfulmodeler.substack.com/p/shap-is-not-all-you-need. Sauce. Substack was down when I reposted

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization | Causal Inference

1 年

Christoph Molnar has published extensively on this and much more on ML and analytic techniques

Andreas Wagenmann

Freelance Senior Data Scientist & ML / Data / Search Engineer | Speaker | Coach

1 年

Why not reference the authors article directly rather than creating a complete repost as separate article which lists the one who copied as author without any contribution?

查看更多评论

要查看或添加评论，请登录

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

Did you want to know what you’d get for Christmas? Accepting uncertainty in life

2024年12月26日

Did you want to know what you’d get for Christmas? Accepting uncertainty in life

I got a huge surprise yesterday morning when I opened up my perfect holiday read, just in time for me to to break my…

1 条评论
Military Use of Machine Learning “Magic Powder” in Gaza

2024年4月12日

Military Use of Machine Learning “Magic Powder” in Gaza

This is not a political viewpoint. It is based on my own investigation on how what I love can be used or misused once…

5 条评论
Why Does My Model Not Generalize Well?

2023年12月27日

Why Does My Model Not Generalize Well?

We can learn a lot from the analysis of ecological data. These data can show complex temporal, spatial, hierarchical…

6 条评论
ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

2023年12月7日

ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

TLDR: Look at the time dependence of the distribution of metrics from an ensemble of strategies When I evaluate…
Robust AI: Rethinking Data Strategies

2023年11月2日

Robust AI: Rethinking Data Strategies

In the world of data, we often find ourselves working with whatever information comes our way. But is that always the…

2 条评论
Data Science and Data Engineering

2023年10月17日

Data Science and Data Engineering

Alastair Muir, Phd, BSc, BEd, MBB Managing Partnerships In today's data-driven world, the terms "data science” (DS) and…
Data scientists: How to talk with your subject matter experts

2020年10月23日

Data scientists: How to talk with your subject matter experts

I had an amazing meeting with my subject matter expert partner yesterday. I came to the table with lots of CNN, LSTM…

3 条评论
Why 85% of AI and ML initiatives fail*

2020年9月4日

Why 85% of AI and ML initiatives fail*

*Click bait headline, but it's not "failure to get good data", or "lack of executive support" If I see headlines like…

3 条评论
Reporting on your data science results can be a full time job

2017年10月3日

Reporting on your data science results can be a full time job

All Data Scientists should possess reporting and communication. The BBC is adding its own "Data Journalist" set to the…
Machine Learning - Old Fish in New Paper

2017年7月12日

Machine Learning - Old Fish in New Paper

When NOT to use deep learning? Pablo Cordero’s post, Jeff Leek's Simply Stats Blog, and a rebuttal from Andrew Beam…

See all articles

SHAP is not all you need (or why you should always use permutation feature importance)

?? Alastair Muir, PhD, BSc, BEd, MBB

Data Science Consultant | @alastairmuir.bsky.social | Risk Analysis and Optimization | Causal Inference

A most annoying misconception in the world of machine learning interpretability

Short primer on SHAP and PFI

领英推荐

SHAP Is Not All You Need

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

社区洞察

其他会员也浏览了

SVD — Single Value Decomposition

Fun with Graphing in Power BI - Part 3i

A Deep Dive into Quantum-Enhanced Variational Autoencoder for Synthetic Data Creation

Kalman Filter: The first dive

Markov and the Mean Reversion Framework

GRAPH THEORY

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Role of Kernel PCA in Non-Linear Dimensionality Reduction

A Tutorial on Ridge and Lasso Regression

Machine Learning Unveils House Price Predictions!

A most annoying misconception in the world of machine learning interpretability

Short primer on SHAP and PFI

领英推荐

SHAP Is Not All You Need

?? Alastair Muir, PhD, BSc, BEd, MBB的更多文章

Did you want to know what you’d get for Christmas? Accepting uncertainty in life

Military Use of Machine Learning “Magic Powder” in Gaza

Why Does My Model Not Generalize Well?

ChatGPT vs Gemini: What Does a Game Changer in AI Look Like?

Robust AI: Rethinking Data Strategies

Data Science and Data Engineering

Data scientists: How to talk with your subject matter experts

Why 85% of AI and ML initiatives fail*

Reporting on your data science results can be a full time job

Machine Learning - Old Fish in New Paper

社区洞察

其他会员也浏览了

SVD — Single Value Decomposition

Fun with Graphing in Power BI - Part 3i

A Deep Dive into Quantum-Enhanced Variational Autoencoder for Synthetic Data Creation

Kalman Filter: The first dive

Markov and the Mean Reversion Framework

GRAPH THEORY

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Role of Kernel PCA in Non-Linear Dimensionality Reduction

A Tutorial on Ridge and Lasso Regression

Machine Learning Unveils House Price Predictions!