Hyperband
Yeshwanth Nagaraj
Democratizing Math and Core AI // Levelling playfield for the future
Hyperband is an optimization algorithm used for hyperparameter tuning in machine learning. It was introduced in 2016 by Lisha Li et al. as an improvement over traditional methods such as grid search and random search.
The main idea behind Hyperband is to use a technique called "successive halving" to allocate computational resources efficiently during the hyperparameter search process. The algorithm starts by randomly sampling a set of hyperparameter configurations and training them for a fixed number of iterations, often referred to as the "budget." At the end of this initial phase, the configurations are ranked based on their performance.
In the successive halving stage, the top-performing configurations are selected and given more computational resources, while the poor-performing configurations are discarded. The remaining configurations are trained for more iterations, and the process is repeated until only one configuration remains.
Hyperband's key insight is that it focuses computational resources on promising configurations early on, while quickly eliminating underperforming configurations. By doing so, it is able to find good hyperparameter settings with fewer resources compared to traditional methods.
The algorithm introduces a hyperparameter called "halving factor" that determines the proportion of configurations that survive at each iteration. It also employs an adaptive scheduling mechanism that allows for varying resource allocations, ensuring that configurations with potential have enough resources to converge to their optimal performance.
Hyperband has gained popularity due to its ability to efficiently explore the hyperparameter space and discover good configurations quickly. It is particularly useful when the computational resources available for hyperparameter tuning are limited or expensive, as it maximizes the utilization of these resources.
Overall, Hyperband is a powerful algorithm for hyperparameter optimization that strikes a balance between exploration and exploitation, allowing for efficient and effective search in the hyperparameter space.
领英推荐
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score
from math import log, ceil
import numpy as np
def hyperband(X, y, max_iter=81, eta=3):
? ? configspace = {
? ? ? ? 'C': np.logspace(-5, 2, 8),
? ? ? ? 'gamma': np.logspace(-5, 2, 8),
? ? ? ? 'kernel': ['linear', 'rbf']
? ? }
? ? s_max = int(log(max_iter, eta))
? ? B = (s_max + 1) * max_iter
? ? # Initial random hyperparameter configurations
? ? T = [configspace] * (s_max + 1)
? ? for s in range(s_max, -1, -1):
? ? ? ? n = ceil(B / max_iter / (s + 1) * eta ** s)
? ? ? ? r = max_iter * eta ** (-s)
? ? ? ? configs = [sample_config(T[s]) for _ in range(n)]
? ? ? ? results = []
? ? ? ? for config in configs:
? ? ? ? ? ? score = evaluate_config(config, X, y, r)
? ? ? ? ? ? results.append((config, score))
? ? ? ? # Select top-performing configurations
? ? ? ? results = sorted(results, key=lambda x: x[1])[:int(n / eta)]
? ? ? ? T[s] = [config for (config, _) in results]
? ? best_config, best_score = None, -float('inf')
? ? # Final tuning using the best configuration
? ? for s in range(s_max, -1, -1):
? ? ? ? n = ceil(B / max_iter / (s + 1) * eta ** s)
? ? ? ? r = max_iter * eta ** (-s)
? ? ? ? for config in T[s]:
? ? ? ? ? ? score = evaluate_config(config, X, y, r)
? ? ? ? ? ? if score > best_score:
? ? ? ? ? ? ? ? best_config, best_score = config, score
? ? return best_config, best_score
def sample_config(configspace):
? ? config = {}
? ? for key in configspace:
? ? ? ? config[key] = np.random.choice(configspace[key])
? ? return config
def evaluate_config(config, X, y, max_iter):
? ? X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
? ? model = SVC(C=config['C'], gamma=config['gamma'], kernel=config['kernel'], max_iter=max_iter)
? ? model.fit(X_train, y_train)
? ? y_pred = model.predict(X_val)
? ? score = accuracy_score(y_val, y_pred)
? ? return score
# Load the dataset
iris = load_iris()
X, y = iris.data, iris.target
# Hyperparameter tuning using Hyperband
best_config, best_score = hyperband(X, y)
print("Best configuration:", best_config)
print("Best score:", best_score)
In this example, we use the Iris dataset from sklearn.datasets and the Support Vector Machine (SVM) classifier from sklearn.svm. The hyperparameters to tune are 'C' (penalty parameter), 'gamma' (kernel coefficient), and 'kernel' (kernel type).
The hyperband function implements the Hyperband algorithm. It takes the dataset (X and y), maximum iterations (max_iter), and eta value (eta) as inputs. The configspace dictionary defines the ranges of values for each hyperparameter.
The algorithm proceeds by initializing the hyperparameter configurations (T) randomly. It then iterates over different successively halved stages (s) and selects top-performing configurations to continue to the next stage. In each stage, the configurations are evaluated and sorted based on their performance. Finally, the best configuration is selected from the final stage.
The sample_config function randomly samples a configuration from the provided configspace. The evaluate_config function trains an SVM model with the given configuration and evaluates its performance using accuracy on a validation set.
After running the hyperparameter tuning using hyperband, the best configuration and its corresponding score are printed.