Hyperparameter optimization in Machine Learning Part-1: Algorithms
Aditya Anand
Senior Business Manager | Empowering AI & Cloud Transformations at E2E Networks | NVIDIA Elite Partner
Hyperparameter optimization in Machine Learning Part-1: Algorithms
In this two-part article series, we are going to learn about hyperparameter optimization in machine learning. Part-1 is about the theoretical foundation and algorithms, while Part-2 will be devoted to practical aspects like libraries, frameworks, and services for hyperparameter optimization.?
Hyperparameter optimization for machine learning is a particular use case of black box optimization. Now, what are hyperparameters, and what is black box optimization? Let's start by learning about black box optimization. When we have a function with some known parameters but we can't really peer inside the function a, you want to find the best setting options for these parameters. The only way to assess their efficiency is to set the parameters appropriately and run an experimental process, which is considered quite expensive. Running this process gives you an objective value or performance measure through which you can find the best parameters. The system is relatively opaque, and evaluation can be costly.
Let's take an example of an object detection model. In the image below, you can observe the various hyperparameters associated with the model, training, and suppression:
In machine learning, hyperparameter optimization is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is used to control the learning process. By contrast, the values of other parameters (typically node weights) are learned. One cannot learn hyperparameters during the training, and we need to set them in advance. Afterward, they can be formulated as black-box optimization. AutoML (the process of automating the time-consuming, iterative tasks of machine learning model development) is also a black-box optimization problem.
There are two main algorithmic approaches for black-box optimization:
Bayesian Optimization
Bayesian Optimization is one among many popular approaches. It is the best approach for a broad range of problems.?
As an example, we take a single dimension parameter space and plot the objective function on the y-axis. In the Bayesian approach, we try to model the objective function. The green points show the values of parameters and the black line is an objective function but it is hidden from us. So we try to come up with some sort of region in which the optimization function lives (which is shaded in blue in the above figure). Now, this problem is turned into an explore-exploit tradeoff problem which we call a multi-arm bandit problem.
领英推荐
Now the question arises what is a multi-arm bandit problem? In probability theory and machine learning, the multi-armed bandit problem is a problem in which a fixed set of resources must be allocated between competing choices in a way that maximizes their expected gain when each choice's properties are only partially known at the time of allocation and may become better understood as time passes or by allocating resources to the choice.?
One standard strategy is to find the maximum point in the right envelope which is shown as the star in the diagram and use that value of the parameter. As we can see,?the objective function value for the star parameter is not the best so we change and update this picture.
Using Gaussian Process to model the function?
Gaussian Process is state-of-the-art for many small datasets but there are other choices too for modeling the objective function. In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those (infinitely many) random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.?
Simple Probabilistic Algorithm
One approach can be a simple random search (Random Search sets up a grid of hyperparameter values and selects random combinations to train the model and score) but there are techniques that are better:
Automated Early Stopping
For some problems like Machine Learning, the optimization is not a complete black box as we have access to additional information, i.e., information found during training.
As depicted in the accuracy vs. training time plot, in the above figure, we have some solid lines that depict the training that has been completed and a dotted line showing the ongoing epoch. But comparing these two, we can make the decision to stop early if we are sure that this combination of parameters is not going to give optimal results.?
Closing Thought:-?We encourage the readers to create a neural network and do hyperparameter optimization on a free GPU Trial on E2E Cloud. For getting your free credits please contact:- [email protected]
Senior Business Manager | Empowering AI & Cloud Transformations at E2E Networks | NVIDIA Elite Partner
1 年www.e2enetworks.com - E2E Networks Limited