Beyond "AI" hype, what really matters
One of the common theme you would be reading on AI in any news article or magazine(s) will be, "AI has now surpassed human cognitive performance on many fronts" and several examples like self-driving cars from Tesla, DeepMind's AlphaGo beating Lee Sedol would be quoted.
But, when as an AI practitioner i look under the hood, i see few areas that really matter before AI can be embedded everywhere. I am planning to pen down my understanding and thoughts on each of these areas as a series.
1. Explainable AI
2. Training in AI
3. Scaling AI models
4. Automated ML
5. Moving AI models to production
As first part in this series, let me start with Explainable AI (XAI).
Explainable AI
One of the common criticisms that AI teams face from business is that, it is all "black-box" or "mambo-jumbo" that gives you decisions or results, without telling you how it arrived at that decision or result. I would not blame you, if you get a "déjà vu" feeling.
A. What is XAI?
According to Google AI team, "Explainable AI" (XAI) is a set of tools and frameworks to help you develop interpret-able and inclusive AI models and deploy them with confidence. These should aid in detecting and resolving bias, drift and other gaps in data and models.
B. Why do we need XAI?
But if you ask, why is XAI important to different stakeholders, I’m sure you will get different answers. So, let us try to get different stakeholders’ perspective, and understand why it is important to them (albeit in different ways),
So if you look from different stakeholders perspective, let us understand why this is important,
1. Domain experts who leverage the model:
i. Need to trust the model, so that it can be leveraged (radiologist or a doctor depending on AI model to analyse the scan/x-ray results and identify health condition, should be able to trust the model).
ii. Help to uncover new insights (AlphaGo Zero developed unconventional strategies and created new moves, including those which beat the World Go Champions Lee Sedol and Ke Jie)
2. End users affected by the model decision: Understand the decision and analyse if it is a fair one. New privacy rules are moving in the direction of making “right to explanation” (read as “meaningful information about the logic involved” in automated decisions) mandatory. E.g.: why did my loan application get rejected?
3. AI Modelling/DevOps Team:
How can I improve the model accuracy, efficiency and other key metrics. How can I identify potential adversarial attacks on my model? How can i identify and fix bias in the model and model building process. Help me to debug errors in my predictions. Is my model understanding, what it is supposed to understand or relying on inconsequential features (like the famous example quoted in LIME paper, where the model classified Husky as a Wolf because of the surrounding snow in the image! It looks like the model learned from training images that wolves are always seen on snow, and dogs are always on grass, instead of learning how a dog or wolf looks like).
4. Executive board/Management Team: Ensure regulatory compliance and brand reputation is not tarnished (Amazon reportedly scrapped its AI based recruitment model for bias against women). Ensure that the model will lead to better revenue, profits or cost optimization opportunities.
C. How do we bring in XAI?
In the current world, most of the explanation comes after the model is built, like an afterthought. Coming to think of it, it is not far away from human reasoning, although we all like to think of ourselves as rational thinkers, most of the time I feel we decide on an impulse along with some factual considerations and then figure out reasons on why we took a certain decision.
In AI world, usually explainability comes post-facto (after modelling), by trying to find the relationship between output and input or by building another black box model to analyse input and output to explain our original model (my head is spinning!).
Ideally “explainability” should be part of all the modelling phases (data analysis/understanding phase, modelling phase, post-modelling phase) as pointed out in this article.
1. Data Analysis/Understanding Phase:
i. Form hypotheses that you would like to explore or validate via modelling, rather than building models and then see how it works. E.g.: adding a new convolution layer, will be able to extract higher features, rather than throwing random number of layers at the model.
ii. Ensure that explainable features are considered, rather than going for mathematically complicated features which doesn’t support explainability, but seems to be working on training/test data (which might not work later). E.g: taking polynomials of some numeric feature, when it doesn’t make sense as a feature, but seems to work on training dataset
iii. Ensure features doesn’t include the ones, which could bring in unnecessary bias (from historical training dataset, this is an easy way to carry over human bias to an AI model)
2. Modelling Phase:
All along in my career I have learned to prefer and adopt, simpler solutions over complex solutions, when both give you similar results.
Choice is between explainable or more complicated model: some models are inherently explainable (like decision trees), compared to other complicated models (like neural nets), which might not be easily explainable with their complex architecture and number of parameters involved.
NB: I don’t agree on the accuracy dimension, hence replaced it with complexity. Complexity increases, when you move up as shown in the graph across different modelling choices, but it doesn’t necessarily mean accuracy goes up
“Life is really simple, but we insist on making it complicated.”
- Confucius
3. Post-Modelling Phase: In this phase you have your model ready and you have few choices on covering explainability,
i. Interpretability: In this mode, you’re trying to work out the relationship between inputs (features) and output, without really trying to understand the inner workings of the model. This can be done at a local (individual prediction) level or global (in terms of entire model with partial dependency plot, global feature importance etc) level.
ii. Explainability: Understand the inner workings of the model and use that understanding to explain how a model would behave.
Since I have already covered on what could be done at pre-modelling and modelling phase, let us look at some of the popular options available in post-modelling phase.
1. Attribution based methods: These methods looks at “relevance or contribution” of each input feature (lowest level) in determining the final output.
i. Local Surrogate Model (Model Agnostic): Create an interpretable linear model which mimics the original model to provide output for a set of sample observations around the given observation. This technique is “Local”, since, it helps us to understand why a single observation was classified in a specific way and it is model agnostic, because it treats model as a black box and doesn’t need to know how the model really works (internal working). e.g.: LIME (check out the blog by creators)
Refer the picture on right side for an Explanation for a prediction from Inception model. The top three predicted classes are “tree frog,” “pool table,” and “balloon” Sources: Marco Tulio Ribeiro, Pixabay (frog, billiards, hot air balloon).
ii. Game Theory Based (Shapley Values): Attribute prediction values to features by figuring out average marginal contribution of each feature over different permutation of features. In layman’s language, this means the SHAP values of all features will sum up to explain the difference between the actual prediction and a baseline, as shown below (this is from dans becker’s article in Kaggle, which explains which features contribute more to predicting that a team would have man of the match based on different features like number of passes, saves, fouls committed etc in a game).
iii. Back-propagation methods: Re-distribute or trickle-down final output value in a backwards fashion across different layers and neurons (in a neural network). As neural networks are non-linear, need to take care while re-distributing, to ensure that re-distribution doesn’t happen across non-active neurons. e.g.: DeepLIFT (video by authors)
2. Concept Based (TCAV – Testing with Concept Activation Vectors): This method looks at concepts and not individual features, which are applicable at a global level. These concepts are at a higher level, since these are defined by users (like gender, doctor, zebra stripes etc, instead of individual pixels). Users need to define the concepts and provide several samples with and without the concept. In layman’s language, TCAV learns concepts (defined by users) from examples and identify how it is important in deciding the final outcome (ratio of pictures having the concept that increased the probability of the outcome).
Hope this has been useful, please let me know your thoughts and suggestions.
Digital Transformation, Service Management, Cloud Consulting
4 年Really informative and well articulated article coming from the expert practioner himself :) Waiting for your rest.. Came across a good read from Github on #XAI. This has lots of good case studies: https://pbiecek.github.io/xai_stories
NLP Engineer|Artificial Intelligence|AWS
4 年Nice read, thanks for sharing
Head of Personal & Business Banking Operations & Technology at Barclays
4 年Nice one Maju ??
Senior Director - Enterprise Architecture & Solutions Development
4 年Very insightful article, Maju Devassy. Lots of TILs for me.. Waiting for the next one in the series!