登录查看更多内容

Machine Learning Black Box Model Explainability

Deepak Kumar

Data Science & Product Lead I Merchandising Analytics | Demand Planning & Inventory Optimisation I Promotion & Pricing Analytics l Digital Transformation l Data Products I MLOps

发布日期: 2019年9月10日

Most people say Machine Learning models are like black box, they give higher accuracy in prediction but understanding these models are complex task. Model dies at proof of concept stage - Management/Stake Holders are strategy people who are interested in understanding the logic behind the predictions than accuracy of the Machine learning model. Inability to explain the complexity of model in layman terms to management/ decision maker leads to loss of trust in them and subsequently model is not adopted by the decision makers.Model dies at proof of concept stage - Management/Stake Holders are strategy people who are interested in understanding the logic behind the predictions than accuracy of the Machine learning model. Inability to explain the complexity of model in layman terms to management/ decision maker leads to loss of trust in them and subsequently model is not adopted by the decision makers.

As a Data Scientist, our job is to provide insights from the model which can be used by decision makers. Certain questions can be answered to create trust worthiness among decision makers to adopt the model. We will look at these three approaches to explain any complex models 1. Permutation Importance 2. Partial Dependence Plot 3. SHAP - SHapely Additive exPlanations). Python Code Snippets are attached for reference.

Permutation Importance for finding important features - Permutation Importance is calculated after the model has been fitted. Randomly re-ordering a single variable should cause less accurate predictions, since the resulting data no longer corresponds to anything observed in the real world. Model accuracy especially suffers if we shuffle a variable that the model relied on heavily for predictions. E.g-

Dataset Description -Boston Data Variables Description

CRIM - per capita crime rate by town, ZN - proportion of residential land zoned for lots over 25,000 sq.ft., INDUS - proportion of non-retail business acres per town, CHAS - Charles River dummy variable (1 if tract bounds river; 0 otherwise), NOX - nitric oxides concentration (parts per 10 million), RM - average number of rooms per dwelling, AGE - proportion of owner-occupied units built prior to 1940, DIS - weighted distances to five Boston employment centres,RAD - index of accessibility to radial highways, TAX - full-value property-tax rate per $10,000, PTRATIO - pupil-teacher ratio by town, B - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town, LSTAT - % lower status of the population, MEDV - Median value of owner-occupied homes in $1000's

The values towards the top are the most important features, and those towards the bottom matter least.
The first number in each row shows how much model performance decreased with a random shuffling (in this case, using "accuracy" as the performance metric).
Like most things in data science, there is some randomness to the exact performance change from a shuffling a column. We measure the amount of randomness in our permutation importance calculation by repeating the process with multiple shuffles. The number after the ± measures how performance varied from one-reshuffling to the next.
You'll occasionally see negative values for permutation importance's. In those cases, the predictions on the shuffled (or noisy) data happened to be more accurate than the real data. This happens when the feature didn't matter (should have had an importance close to 0), but random chance caused the predictions on shuffled data to be more accurate. This is more common with small datasets, like the one in this example, because there is more room for luck/chance.

Partial Dependence Plot - Partial Dependence plot shows how features affect predictions.Partial Dependence plot can be interpreted similarly to the coefficients of linear or logistic regression models,Though, PDP on sophisticated models can capture more complex patterns than coefficients from simple models.

It shows how the prediction changes locally, when the feature ‘rm’ is varied keeping other features constant. We can get insight also from this like when RM- average number of rooms below 6 has no or very less impact on the price but its impacting on price where average room size is more than 6.

2D-PDP lot for features interaction in the model

SHAP - Its gives feature importance at each instance level which means that rather than looking at global level or over all feature importance we can understand feature importance at each instance/data point level. SHAP connects game theory with local explanations, uniting several previous methods and representing the only possible consistent and locally accurate additive feature attribution method based on expectations. It is based on Shapley values, a technique used in game theory to determine how much each player in a collaborative game has contributed to its success. In our case, each SHAP value measures how much each feature in our model contributes, either positively or negatively.

Shapley value, has a nice interpretation in terms of expected marginal contribution.

?Suppose that there are two players and v({1}) = 10, v({2}) =12 and v({1,2}) = 23

?Two Conditions :- 1 comes first then 2 or 2 comes first then 1

The above explanation shows features each contributing to push the model output from the base value (the average model output over the test dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue

SHAP Values for Deep Learning Models

Predictions for two input images are explained in the plot above. Red pixels represent positive SHAP values that increase the probability of the class, while blue pixels represent negative SHAP values the reduce the probability of the class. By using ranked_outputs=2 we explain only the two most likely classes for each input.

要查看或添加评论，请登录

Deepak Kumar的更多文章

Enhancing Global Supply Chain Resilience: The Role of AI and Machine Learning in Mitigating the dominos effect of Red Sea Disruptions

2024年4月24日

Enhancing Global Supply Chain Resilience: The Role of AI and Machine Learning in Mitigating the dominos effect of Red Sea Disruptions

The Red Sea: A Crucial Artery for Global Commerce The Red Sea, strategically positioned as the gateway to the southern…
Why do Consumer brands need to revisit their RGM Strategy?

2022年7月22日

Why do Consumer brands need to revisit their RGM Strategy?

Revenue Growth Management is a well-known practice and widely used by many CPG organizations, but post-pandemic…

2 条评论
Fraud Analytics using Extended Isolation Forest Algorithm

2020年6月17日

Fraud Analytics using Extended Isolation Forest Algorithm

Fraud Identification in data is very tedious and complex task for Data Scientist in organization. As per industry…

2 条评论
Deep Learning Model - RBM(Restricted Boltzmann Machine) using Tensorflow for Products Recommendation

2018年3月19日

Deep Learning Model - RBM(Restricted Boltzmann Machine) using Tensorflow for Products Recommendation

Product recommendations are a must-have feature for all e-commerce websites, as they can drive sales, increase…

5 条评论
Playing with The Swish- New Activation Function for building Deep Neural Network

2017年10月22日

Playing with The Swish- New Activation Function for building Deep Neural Network

Choosing the right Activation function is one of the most important steps in building deep neural network. ReLu is most…
Common Activation Function in Deep Learning

2017年7月17日

Common Activation Function in Deep Learning

What is Activation Function? Activation function is function used to transform the activation level of a unit(neuron)…

4 条评论
Dimensionality Reduction using tSNE in python

2017年7月10日

Dimensionality Reduction using tSNE in python

What is tSNE? t-Distributed Stochastic Neighbor Embedding (t-SNE) is a technique for dimensionality reduction that is…
How Banks use transactional Customer Data to increase Revenue and Profitability

2016年12月22日

How Banks use transactional Customer Data to increase Revenue and Profitability

Banking Industry is highly competitive and satisfying customers need is tedious job for banks. Now a day, customers are…

2 条评论
Deep Learning Performance

2016年9月28日

Deep Learning Performance

Deep Learning is subset of Machine Learning of Neural Network family composed of number of levels arranged in…
Marketing Mix Modeling - Analytical Approach

2016年7月11日

Marketing Mix Modeling - Analytical Approach

In today’s word, Marketers are facing biggest challenges to show the ROI from the marketing activities. Analytical…

See all articles

Machine Learning Black Box Model Explainability

Deepak Kumar

Data Science & Product Lead I Merchandising Analytics | Demand Planning & Inventory Optimisation I Promotion & Pricing Analytics l Digital Transformation l Data Products I MLOps

Deepak Kumar的更多文章

社区洞察

其他会员也浏览了

Model Performance Analysis

K-NN Machine Learning Algorithm

Demystifying Regression in comparison to Classification in Machine Learning

What is bias, variance and bias-variance trade-off in Machine Learning?

Machine Learning for Humans: Linear Regression

Twitter: A Data Analyzer for Machine Learning

Underlying assumptions of ML algorithms

Navigating Model Drift: Ensuring Longevity in Machine Learning

Deepak Kumar的更多文章

Enhancing Global Supply Chain Resilience: The Role of AI and Machine Learning in Mitigating the dominos effect of Red Sea Disruptions

Why do Consumer brands need to revisit their RGM Strategy?

Fraud Analytics using Extended Isolation Forest Algorithm

Deep Learning Model - RBM(Restricted Boltzmann Machine) using Tensorflow for Products Recommendation

Playing with The Swish- New Activation Function for building Deep Neural Network

Common Activation Function in Deep Learning

Dimensionality Reduction using tSNE in python

How Banks use transactional Customer Data to increase Revenue and Profitability

Deep Learning Performance

Marketing Mix Modeling - Analytical Approach

社区洞察

其他会员也浏览了

Model Performance Analysis

K-NN Machine Learning Algorithm

Demystifying Regression in comparison to Classification in Machine Learning

What is bias, variance and bias-variance trade-off in Machine Learning?

Machine Learning for Humans: Linear Regression

Twitter: A Data Analyzer for Machine Learning

Underlying assumptions of ML algorithms

Navigating Model Drift: Ensuring Longevity in Machine Learning