登录查看更多内容

Exploring the Reasons for Unexpected Prediction Distributions in Machine Learning Models

Ilia Ekhlakov

Senior Data Scientist @ Wrike | B2B SaaS | Revenue Strategy & Ops | MSc in Physics | 9 YoE

发布日期: 2024年11月15日

When investigating unexpected model behavior, many Data Scientists I know start by analyzing distribution drifts in the most important predictors. This approach can help, but it carries significant risks: you might draw the wrong conclusions or fail to identify the root cause altogether.

Let’s consider a classic example, the California House Price prediction task. Imagine scoring new properties in production using a model trained on this dataset. Normally, we expect predictor distributions to stay relatively consistent over weeks or months. Now, suppose one period shows a clear upward shift in predicted house prices.

This raises two key questions:

What caused the shift?
Can we trust these predictions?

The second question, arguably more critical, boils down to: Is the model's performance still acceptable? There’s a wealth of excellent materials addressing this, including resources from NannyML or this wonderful article by Samuele Mazzanti , so I’ll refer you to those rather than summarizing them here.

Instead, I’d like to focus on the first question: understanding what caused the drift.

In our example, the Median Income in Block Group predictor is typically the most important feature in models trained on this dataset.

For simplicity, let’s assume that during previous inference periods with expected predictions, the median income followed a uniform distribution within the range of 0–15. While this assumption may not realistically reflect the dataset, our primary goal here is to explore how the model’s response function behaves across the range of predictor values.

Now, let’s say univariate metrics like the Population Stability Index (PSI) or Wasserstein Distance indicate a shift in the median income distribution. Is this enough to explain the unexpected predictions? Not necessarily.

To see why, we can examine the model’s sensitivity to median income values, expressed, for instance, through Accumulated Local Effects (ALE). Such a plot might reveal zones where the model’s predictions are either highly sensitive or nearly insensitive to changes in median income:

Below ~2 and above ~9.25: the model’s response flattens, showing little or no sensitivity to shifts. A drift within these zones wouldn’t explain prediction changes. In this specific case, the plateau at high values is primarily due to the very low density of samples with high Median income values, causing them to fall into a terminal node. Again, our main interest here is in the shape of the sensitivity function, which, in another task, might have a plateau with a solid physical or business explanation behind it.
Around ~5.55 to ~5.95: this region may show high sensitivity, where even small shifts, undetected by, for example, PSI if they occur within one bucket, could drastically impact predictions.

Additionally, univariate drift analysis doesn’t account for interactions between predictors, which your model likely considers. Nor does it help detect cumulative effects from small changes across multiple key predictors.

A Better Starting Point

I recommend starting your investigation by comparing the distribution of predictor contributions between the reference and drifted periods. The most straightforward option is to use SHAP values, but other tools can work too. For instance, LightGBM allows you to retrieve a matrix or tensor of feature contributions without external libraries by using the predict or predict_proba methods with predict_contrib=True.

The image below shows the SHAP Summary plots for the reference and examined periods. The examined period's data was generated from the reference data using the following modifications:

For Median Income, all values above 9.5 were replaced with random values from a uniform distribution between 14 and 15.
Random values from a normal distribution with a mean of -2 and a standard deviation of 1 were added to both Latitude and Longitude.

Comparing the distributions of feature contributions provides a correct interpretation of the reasons behind the shift in prediction distribution:

Despite the shift in the distribution of Median Income, it does not impact the predictions distribution.
The primary causes of the shift are Latitude (as it is a more important feature for the model under equal shift in distribution) and, to a bit lesser extent, Longitude.

If feature interactions play a significant role in driving the issue, uncovering an explanation may still require analyzing multiple plots that illustrate how these interactions impact predictions, such as Partial Dependence Plots or Pairwise Feature Interaction Plots. However, comparing the distribution of predictor impacts in the reference and examined periods will provide a clear direction and rationale for your investigation.

Francis Gichere

Lead Data Scientist @ BURN | Decision Science | Applied AI & ML | 6yrs experience in Data

4 个月

Very informative

1 次回应

Denis Sidorenko

Data Scientist | ML Engineer | MScIT

4 个月

Thanks for the revealing ideas, Ilia! I had a question while I was reading. Please clarify the given example regarding SHAP values and the previous explanation around shifts in median income. There are shifts in Lat and Long in SHAP plots, but almost no shifts around MedInc are displayed. Why would you recommend using this technique to determine shifts in MedInc?

2 次回应

查看更多评论

要查看或添加评论，请登录

Ilia Ekhlakov的更多文章

Why Decision Making Requires Probabilities from Predictive Models

2025年1月6日

Why Decision Making Requires Probabilities from Predictive Models

In predictive analytics, there's often a debate: should decisions rely on raw probabilities, or are simpler approaches,…

2 条评论
The Hidden Pitfalls of Using Standard Metrics for Predictive Models: Understanding the Feedback Effect

2024年9月1日

The Hidden Pitfalls of Using Standard Metrics for Predictive Models: Understanding the Feedback Effect

When evaluating predictive models, relying solely on standard metrics like precision and recall can lead to misleading…

3 条评论
Mastering the Art of Target Selection for Business-Efficient Churn Model

2024年4月23日

Mastering the Art of Target Selection for Business-Efficient Churn Model

In the realm of real-world machine learning, particularly in applied settings, the process of defining a target…
Model Fairness: Navigating Business Decisions with Equity

2024年4月17日

Model Fairness: Navigating Business Decisions with Equity

The concept of model fairness has become increasingly important in the realm of machine learning and artificial…
Tackling Noisy Targets: Strategies for Robust Model Training

2024年3月28日

Tackling Noisy Targets: Strategies for Robust Model Training

Traditional loss functions such as Mean Squared Error (MSE) or Cross-Entropy are designed under the assumption of clean…

6 条评论
Why product teams are the best fit for Data Scientists

2024年3月6日

Why product teams are the best fit for Data Scientists

In my eight-year journey as a data scientist, I've witnessed the impact of different team structures firsthand. While…
Why good physicists make good data scientists?

2024年2月27日

Why good physicists make good data scientists?

An academic background in physics is often mentioned as one of the preferred qualifications in the requirements for…

4 条评论
Could Synthetic Tabular Data be Helpful to Cope with Small Data Challenge in Machine Learning?

2024年2月21日

Could Synthetic Tabular Data be Helpful to Cope with Small Data Challenge in Machine Learning?

Synthetic data is often touted as a remedy for the class imbalance problem. However, there are many good sources proven…

6 条评论
Uncertainty Quantification: The Key Ingredient for Reliable Data Science Predictions

2024年2月19日

Uncertainty Quantification: The Key Ingredient for Reliable Data Science Predictions

In the business domain of Data Science, we often want to calculate a metric, such as expected profit or, conversely…

6 条评论
Small Data, Big Noise: Why Feature Engineering is Your Secret Weapon in the Machine Learning Jungle

2024年2月7日

Small Data, Big Noise: Why Feature Engineering is Your Secret Weapon in the Machine Learning Jungle

Imagine sifting for gold nuggets in a riverbed. With a small pan and a lot of pebbles, it's a tedious task, requiring…

6 条评论

See all articles

A Better Starting Point

Ilia Ekhlakov的更多文章

Why Decision Making Requires Probabilities from Predictive Models

The Hidden Pitfalls of Using Standard Metrics for Predictive Models: Understanding the Feedback Effect

Mastering the Art of Target Selection for Business-Efficient Churn Model

Model Fairness: Navigating Business Decisions with Equity

Tackling Noisy Targets: Strategies for Robust Model Training

Why product teams are the best fit for Data Scientists

Why good physicists make good data scientists?

Could Synthetic Tabular Data be Helpful to Cope with Small Data Challenge in Machine Learning?

Uncertainty Quantification: The Key Ingredient for Reliable Data Science Predictions

Small Data, Big Noise: Why Feature Engineering is Your Secret Weapon in the Machine Learning Jungle