Nested Cross-Validation
Yeshwanth Nagaraj
Democratizing Math and Core AI // Levelling playfield for the future
Nested cross-validation (NCV) is a technique used in machine learning and statistics to estimate the performance of a model, especially when tuning hyperparameters. It is particularly useful when the goal is to select the best model and its hyperparameters in a way that is unbiased by the data. Here's a breakdown of the concept:
1. Why Nested Cross-Validation?
Traditional k-fold cross-validation can be biased when hyperparameters are tuned using the same data on which the performance is estimated. This is because the model has "seen" the validation data during the hyperparameter tuning phase, which can lead to overly optimistic performance estimates. Nested cross-validation addresses this issue.
2. How Does It Work?
3. Advantages:
领英推荐
4. Disadvantages:
5. Applications:
Nested cross-validation is commonly used in situations where it's crucial to get an unbiased estimate of a model's performance, such as in medical applications where the consequences of model errors can be significant.
import numpy as np
from sklearn.datasets import load_iris
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV, cross_val_score, KFold
# Load the Iris dataset
data = load_iris()
X = data.data
y = data.target
# Define hyperparameters to tune
param_grid = {
'C': [0.1, 1, 10, 100],
'gamma': [1, 0.1, 0.01, 0.001],
'kernel': ['rbf']
}
# Set up the inner cross-validation
inner_cv = KFold(n_splits=4, shuffle=True, random_state=42)
grid_search = GridSearchCV(SVC(), param_grid, cv=inner_cv, scoring='accuracy')
# Set up the outer cross-validation
outer_cv = KFold(n_splits=4, shuffle=True, random_state=42)
# Execute nested cross-validation and print the average score
nested_scores = cross_val_score(grid_search, X, y, cv=outer_cv)
print(f"Nested CV Average Score: {nested_scores.mean():.4f}")
In this example: