登录查看更多内容

Model Explainability via LIME

Samir Paul

Leadership in Data & AI | Marketing | Supply Chain

发布日期: 2024年5月24日

What will speed up the adoption of Machine Learning in Business ? Why is Machine Learning Interpretability Important?

Think of interpretability as the bridge between humans and machines. It's not just about understanding how a model works; it's about building trust and accountability. Interpretability empowers us to ask questions like "Why did the model make this prediction?" or "Which features were most influential?"

By demystifying AI models, interpretability fosters trust among users, regulators, and stakeholders.

In linear and logistic regression, the weights or coefficients play a important role as they signify the significance of each variable in the predictive model.

Say we are predicting an employee's salary, where we rely on two key features: years of experience and a previous performance rating In this scenario, our model might look something like this:

Salary = w1*macro_condition + w2*advertisement

These coefficients serve as indicators, shedding light on whether the rating holds more weight in determining an employee's salary or if it's the experience that primarily influences the outcome. In essence, these weights offer insights into the relative importance of each feature, helps in understanding the dynamics between variables and their impact on the predicted outcome.

But in case of Random Forest, XGBoost or other ML Models, interpretability of Model is not easy.

In this article, I have used LIME framework to interpret a Random Forest Regression based model.

Before getting into LIME framework ON how it can be used to interpret a Regression based model , lets talk about few things:

Surrogate Model: A surrogate model acts as a simplified approximation of a black-box model, offering a glimpse into its decision-making process. While black-box models, such as deep neural networks or ensemble methods, may deliver superior predictive performance, their inner workings often remain opaque, leaving users in the dark about how and why specific predictions are made. Surrogate models bridge this gap by providing a more interpretable alternative.

Global Explainability: Global explainability provides an overarching understanding of how a model works across the entire dataset.

In the realm of AI, global explainability helps us understand the overall behavior of a model. It answers questions like: "What features are most important for making predictions?" or "How does the model generalize across different subsets of data?" Think of it as the big picture view that guides our trust in the model's decision-making process. A surrogate model involves training a more interpretable model, such as a decision tree or linear regression, on the predictions or intermediate representations generated by the black-box model. By mapping the inputs to the outputs of the black-box model, the surrogate model encapsulates its underlying logic in a more digestible form.

Local Explainability: Local Explainability focuses on explaining the model's prediction for a specific instance or observation. So, instead of looking at the entire dataset, we're analyzing how the model arrived at a particular decision for a particular input.

This is crucial for understanding why a model made a specific prediction for an individual case. It helps us answer questions like: "What factors influenced the model to deny a loan for this particular applicant?"

Model Agnostic Interpretability: Model agnostic techniques allows usage of more complex models

without losing all interpretability power. Model agnostic interpretability techniques can be applied to most of the machine learning model, regardless of its type or complexity. They provide insights into how models make decisions without relying on the inner workings of a specific algorithm.

LIME (Local Interpretable Model-agnostic Explanation) stands as a robust framework widely embraced in the industry for providing human-friendly explanations to tabular, text, and image data. Its versatility across different data modalities creates trust and confidence in black-box machine learning models. Operating on the principle of local interpretability, LIME offers granular insights at the instance level, making complex model behaviour accessible and actionable for all stakeholders.

Below is the implementation in Python using LIME for a Random Forest Regression model:

# Import Libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.ensemble import RandomForestRegressor from sklearn.model_selection import train_test_split from sklearn.preprocessing import OneHotEncoder, LabelEncoder from sklearn import tree from sklearn.metrics import mean_squared_error

# Read csv file

df = pd.read_csv('data.csv')

# sample data screenshot below

# Checking Missing Values

df.isnull().sum().sort_values()

# Missing Value Treatment - Item_Weight by median and Outlet_Size with mode

df.Item_Weight.fillna(df.Item_Weight.median(), inplace=True)

df.Outlet_Size.fillna(df.Outlet_Size.mode()[0], inplace=True)

# Reducing the Cardinality of 'Item_Type_Combined'

df['Item_Type_Combined'] = df['Item_Identifier'].apply(lambda df: df[0:2])

df['Item_Type_Combined'] = df['Item_Type_Combined'].map({'FD':'Food', 'NC':'Non-Consumable', 'DR':'Drinks'})

df['Item_Type_Combined'].value_counts()

# No of Years of the existence of stores

df['Existence_Years'] = 2013 - df['Outlet_Establishment_Year']

# Updating the values of Item_Fat_Content

df['Item_Fat_Content'] = df['Item_Fat_Content'].replace({'LF':'Low Fat', 'reg':'Regular', 'low fat':'Low Fat'})

df['Item_Fat_Content'].value_counts()

# label encoding of ordinal variables

lbl_enco = LabelEncoder()

df['Outlet'] = lbl_enco.fit_transform(df['Outlet Identifier'])

领英推荐

Small Worlds Yield Big Ideas

Markus J. Buehler 3 周前

AI’s Greatest Challenge? Understanding the…

Iain Brown PhD 3 个月前

Ensemble Methods in Practice: Combining the Strengths…

Dr. Vivek Pandey 1 年前

cat =['Item_Fat_Content','Outlet_Location_Type','Outlet_Size','Item_Type_Combined','Outlet_Type','Outlet']

lbl_enco = LabelEncoder()

for i in cat:

df[i] = lbl_enco.fit_transform(df[i])

# dropping the ID variables and variables that have been used to extract new variables

df.drop(['Item_Identifier', 'Outlet_Identifier','Item_Type','Outlet_Establishment_Year'],axis=1,inplace=True)

# separating the dependent and independent variables

X = df.drop('Item_Outlet_Sales',axis =1)

y = df['Item_Outlet_Sales']

# creating the training and validation set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.25, random_state=42)

# installing lime library

!pip install lime

# training the Random Forest model

model = RandomForestRegressor(n_estimators=200,max_depth=5, min_samples_leaf=100,n_jobs=-1, random_state=10)

model.fit(X_train, y_train)

# model - RMSE on validation Set

np.sqrt(mean_squared_error(y_test,model.predict(X_test)))

# creating the explainer function

from lime.lime_tabular import LimeTabularExplainer

explainer = LimeTabularExplainer(X_train.values, mode="regression", feature_names=X_train.columns)

# storing a observation

j = 12

X_obs = X_test.iloc[[j], :]

X_obs

# Explaining the Random Forest model

expl = explainer.explain_instance(X_obs.values[0], model.predict)

expl.show_in_notebook(show_table=True, show_all=False)

print(expl.score)

Leftmost visualization shows the model's predicted outcome with a range of possible values depicting best and worst case.

Middle visualization depicts which variables influences the prediction being on the higher or lower side.

Most important identified variables are shown in the rightmost visualization, in the descending order of importance.

Ujjyaini Mitra SETU School #ExplainableAI #ModelInterpretability #Datascience #Analytics #ai

Muhammad Ishtiaq Khan

Driving Advanced Analytics & Digital Transformation in Audit & Assurance | Expertise in Continuous Auditing, Fraud Analytics & Automation | xPTCL & Ufone (e& UAE) | Data Science - Agentic AI - Machine Learning - GenAI

10 个月

Excellent points on accelerating Machine Learning adoption and the importance of interpretability

2 次回应

Sumit Tapdia

Strategic Sales & Marketing Visionary | Digital Transformation & AI-Driven Innovation | P&L Growth Architect

10 个月

Awesome Samir. It’s really a good read ??????

2 次回应

Ujjyaini Mitra

Building Deltacube.ai & SETU. DC offers Data Science, AI innovation & Digital Transformation. SETU - Bridging the Gap of Emerging Industry Skill demand in Data Domain

10 个月

Very well written and simply explained. Ability to control the decision journey and reduce model biases is attempted through explainability. LIME framework also used nicely Samir Paul

5 次回应

V. Srinivasa Rao (VSR)

Digital Transformation Champion | Startup Mentor | Author & Speaker | Social Changemaker | On a Mission to Shape Bharat 2047

10 个月

Using the LIME framework to interpret a Random Forest Regression model is a commendable approach, as it helps bridge the gap between complex machine learning models and human understanding, promoting transparency and trust.

3 次回应

Anuroop Ajmera

Vice President | Data Engineering | Analytics (ISB) | MBA (IIM Nagpur)

10 个月

Any models based on Atificial Neural Networks are unexplainable, which includes Gen AI models also which are quite famous these days as most of the research is going on here.

4 次回应

查看更多评论

要查看或添加评论，请登录

Samir Paul的更多文章

Designing an AI Model to Predict Bankruptcy Using Financial Ratio Data

2024年9月11日

Designing an AI Model to Predict Bankruptcy Using Financial Ratio Data

What Is Bankruptcy? Bankruptcy is a legal process started when an organization can’t repay their outstanding debts or…

16 条评论
Navigating the Trade-offs: Accuracy vs. Interpretability in AI Models

2024年9月3日

Navigating the Trade-offs: Accuracy vs. Interpretability in AI Models

In quest for powerful AI models, data scientists & AI practitioners face a significant dilemma: balancing accuracy and…

20 条评论
Performance Measures - Regression Algorithm

2024年5月8日

Performance Measures - Regression Algorithm

Let me share the many ways we can measure how good regression algorithms are at their job. Understanding these metrics…

2 条评论
Forget the Formulas: Understanding Recall and Precision Made Easy

2024年4月17日

Forget the Formulas: Understanding Recall and Precision Made Easy

Let's start with actual use case, with that we will try to understand how do we measure the efficacy of the…
Concept and business relevance of binomial distribution

2024年4月10日

Concept and business relevance of binomial distribution

Some of my friends asked me, where and how to start learning Machine Learning? I feel, statistics, basic calculus and…
Successfully completed "Post Graduate Program in Business Analytics (PGP-BABI)" from Great Lakes Institute of Management.

2018年9月23日

Successfully completed "Post Graduate Program in Business Analytics (PGP-BABI)" from Great Lakes Institute of Management.

Successfully completed "Post Graduate Program in Business Analytics (PGP-BABI)" from Great Lakes Institute of…

5 条评论

See all articles

Model Explainability via LIME

Samir Paul

Leadership in Data & AI | Marketing | Supply Chain

领英推荐

Samir Paul的更多文章

社区洞察

其他会员也浏览了

The Illusion of Intelligence: Why true AGI remains out of reach

Artificial Intelligence #61

A brief take on Causal AI

Solving the AI Data Shortage Before It Is a Crisis

Something is missing in the AI growth debate

AI Predictions for 2025: A probabilistic look

When HAL 9000 Meets Your Hunches: Balancing Intuition and Analytics in the GenAI Era

Strategy in the Age of Artificial Intelligence and Machine Learning

AI Market Intelligence: A Deep Dive

Quantum-Inspired Probabilistic Reasoning AI (QIPR-AI): A Theoretical Perspective on Decision-Making Under Uncertainty

领英推荐

Samir Paul的更多文章

Designing an AI Model to Predict Bankruptcy Using Financial Ratio Data

Navigating the Trade-offs: Accuracy vs. Interpretability in AI Models

Performance Measures - Regression Algorithm

Forget the Formulas: Understanding Recall and Precision Made Easy

Concept and business relevance of binomial distribution

Successfully completed "Post Graduate Program in Business Analytics (PGP-BABI)" from Great Lakes Institute of Management.

社区洞察

其他会员也浏览了

The Illusion of Intelligence: Why true AGI remains out of reach

Artificial Intelligence #61

A brief take on Causal AI

Solving the AI Data Shortage Before It Is a Crisis

Something is missing in the AI growth debate

AI Predictions for 2025: A probabilistic look

When HAL 9000 Meets Your Hunches: Balancing Intuition and Analytics in the GenAI Era

Strategy in the Age of Artificial Intelligence and Machine Learning

AI Market Intelligence: A Deep Dive

Quantum-Inspired Probabilistic Reasoning AI (QIPR-AI): A Theoretical Perspective on Decision-Making Under Uncertainty