登录查看更多内容

Conformal Prediction in Regression: how NOT to Build Intervals (and How to Fix Mistakes)

Sergei Scherbakov

Head of Data and R&D | Product Lead | AI | ML

发布日期: 2025年3月14日

Conformal Prediction (CP) is the gold standard for building intervals with guaranteed coverage. But even here, things can go wrong if you don’t understand the nuances of the methods. Today, we’ll break down two approaches—Naive and Jackknife—and uncover their pitfalls.

Spoiler: Most mistakes stem from violating the key principle of CP—splitting data into training and calibration sets.

?? The Naive Method: Why It’s Not So "Naive" (But Dangerous)

What People Get Wrong: Classical CP requires strict splitting of data into training and calibration sets. However, many in the "naive" approach:

Train the model on all the data.
Calculate residuals on the same data.
Use the quantile of residuals to build the interval.

The Problem: This violates the exchangeability principle. Residuals on training data are underestimated—the model has already "seen" them, leading to overly narrow intervals.

The Right Way (Split Conformal):

Critique:

? Pros: Simplicity and speed.
? Cons: If the calibration set is small, the quantile will be inaccurate → coverage suffers.
?? Myth: The "naive method" in CP is NOT training on all data. It’s Split Conformal with mandatory splitting!

?? Jackknife: Why It’s Dead, Long Live Jackknife+

What People Get Wrong: The original Jackknife (leave-one-out) in CP does not provide valid coverage for modern models.

领英推荐

Scaling Techniques in Machine Learning: A Beginner's…

Gundala Nagaraju (Raju) 8 个月前

Causality and Inference for Machine Learning

Chirasmita Mallick 2 年前

Class 19 - REGRESSION Notes from the AI Advance…

Hamza Nadeem 1 年前

The reason: a model trained on n?1n?1 observations can differ significantly from the full model → residuals are incomparable.

Code (Simplified):

Critique:

? Pros: Accounts for both model error and its sensitivity to data.
? Cons: Computational complexity O(n)O(n)—unfeasible for large datasets.
?? Important: Jackknife+ requires the model to be stable (small changes in data → small changes in predictions). For neural networks, this rarely holds!

?? Comparison: When to Use What?

?? Key Takeaways and Life Hacks

Never calibrate on training data—it’s a cardinal sin in CP.
For deep learning, Split Conformal is often the only option—Jackknife+ is computationally infeasible.
Check coverage on test data: if empirical coverage ≈ 1?α, the method is correct.
Try CQR (Conformalized Quantile Regression)—a hybrid method for heteroscedastic data.

Personal Experience: In production, Split Conformal works 90% of the time. Reserve Jackknife+ for high-stakes tasks (medicine, finance) where the cost of error is your company’s reputation.

Have you faced issues with interval coverage? Share your stories in the comments! ??

#MachineLearning #DataScience #ConformalPrediction #AI #ML #DL

查看更多评论

要查看或添加评论，请登录

Sergei Scherbakov的更多文章

Implementing Back-pressure in Apache Kafka Stream Pipelines

2025年3月5日

Implementing Back-pressure in Apache Kafka Stream Pipelines

Back pressure management is essential for maintaining stability in event streaming systems. In Apache Kafka, back…
Conformal Prediction in LLMs: a New Frontier in AI Reliability

2025年2月24日

Conformal Prediction in LLMs: a New Frontier in AI Reliability

Conformal Prediction (CP) is a statistical method that adds a measure of confidence to machine learning model…
Doctors aren't needed now? Or how AI is infiltrating medicine.

2023年7月5日

Doctors aren't needed now? Or how AI is infiltrating medicine.

Artificial intelligence is penetrating the medical industry. It can already quickly process huge amounts of information…

1 条评论
Examples of AI in Agriculture

2023年6月27日

Examples of AI in Agriculture

The application of artificial intelligence (AI) in agriculture is providing farmers with a powerful tool for optimizing…
The AI has come, and it's time for us to leave

2023年6月23日

The AI has come, and it's time for us to leave

Artificial intelligence is successfully conquering the world month after month. Someone believed that this was merely a…

1 条评论
AI is killing your parents

2023年6月6日

AI is killing your parents

Many were so delighted with the appearance of AI that they began to insert it wherever they could. Including in the…

2 条评论
Artificial intelligence (AI) will replace up to 70% of specialities by 2045

2023年6月5日

Artificial intelligence (AI) will replace up to 70% of specialities by 2045

AI is moving at a tremendous pace. Giant corporations like Google and Amazon are already cutting their employees…

1 条评论
You will be impoverished. How will AI replace humans in a couple of years?

2023年6月1日

You will be impoverished. How will AI replace humans in a couple of years?

Everyone is afraid of poverty. But no one thought that it would come so quickly, even for recently in-demand…
AI – assistant or killer?

2023年5月30日

AI – assistant or killer?

AI has become an active assistant in many areas. He is already replacing even programmers who have been at their peak…

1 条评论
AI?making life easier for elderly, but at?what cost?

2023年5月25日

AI?making life easier for elderly, but at?what cost?

As technology advances, the older generation is struggling to keep up. Artificial intelligence (AI) is quickly becoming…

See all articles

Conformal Prediction in Regression: how NOT to Build Intervals (and How to Fix Mistakes)

Sergei Scherbakov

Head of Data and R&D | Product Lead | AI | ML

?? The Naive Method: Why It’s Not So "Naive" (But Dangerous)

?? Jackknife: Why It’s Dead, Long Live Jackknife+

领英推荐

?? Comparison: When to Use What?

?? Key Takeaways and Life Hacks

Sergei Scherbakov的更多文章

社区洞察

其他会员也浏览了

Comparing Machine Learning Models to Find the Best Fit

Model Selection

Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio

No Single Best Model in Machine Learning: The Right Fit for Your Problem Matters

MICE or ML? A Purrfect Solution for Data Imputation

Classic Machine Learning Algorithms

Decision Tree for Satellite Image Classification

The Role of Outliers in Machine Learning: Should You Keep or Remove Them?

Predicting the Future: The Role of Simulation and Adoption Models in Gen AI

Predicting consumer choices from eye tracking data

?? The Naive Method: Why It’s Not So "Naive" (But Dangerous)

?? Jackknife: Why It’s Dead, Long Live Jackknife+

领英推荐

?? Comparison: When to Use What?

?? Key Takeaways and Life Hacks

Sergei Scherbakov的更多文章

Implementing Back-pressure in Apache Kafka Stream Pipelines

Conformal Prediction in LLMs: a New Frontier in AI Reliability

Doctors aren't needed now? Or how AI is infiltrating medicine.

Examples of AI in Agriculture

The AI has come, and it's time for us to leave

AI is killing your parents

Artificial intelligence (AI) will replace up to 70% of specialities by 2045

You will be impoverished. How will AI replace humans in a couple of years?

AI – assistant or killer?

AI?making life easier for elderly, but at?what cost?

社区洞察

其他会员也浏览了

Comparing Machine Learning Models to Find the Best Fit

Model Selection

Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio

No Single Best Model in Machine Learning: The Right Fit for Your Problem Matters

MICE or ML? A Purrfect Solution for Data Imputation

Classic Machine Learning Algorithms

Decision Tree for Satellite Image Classification

The Role of Outliers in Machine Learning: Should You Keep or Remove Them?

Predicting the Future: The Role of Simulation and Adoption Models in Gen AI

Predicting consumer choices from eye tracking data