Navigating the Gradient Descent Landscape: Unveiling the Differences between Stochastic, Batch, and Mini-Batch Gradient Descent
Atharva Attarde
Junior at IIIT Dharwad | Junior Researcher | Ex NLP Intern at ERTS Lab IIT Bombay??
Gradient descent stands as one of the fundamental algorithms in machine learning, serving as a cornerstone for achieving optimal parameters. In this article, we explore three distinct types of gradient descents—stochastic, mini-batch, and batch gradient descent—offering an intuitive understanding of each.
Constructing models involves manipulating a multitude of parameters, commonly known as weights and biases, to attain optimal values—a process referred to as training a model. The mechanism employed for this adjustment is gradient descent. While the concept of "gradient," rooted in calculus, guides the model in the right direction, this article assumes readers possess a basic understanding and refrains from delving into intricate details.
The selection of a gradient descent type holds significance in various scenarios, a consideration we will dissect in the ensuing sections.
Stochastic Gradient Descent (SGD):
Imagine yourself as a research analyst at a trading firm assigned the task of gauging the general sentiment about the market from the public.
SGD recommends approaching this task by randomly selecting one person on the street, asking their opinion, and publishing the results. However, relying on the opinion of a single individual poses a risk, as it may not accurately represent the broader sentiment.
In optimization, SGD involves selecting a random point from the training dataset and adjusting the model parameters to better fit that specific point. Despite the potential for making incorrect decisions at each step, SGD is widely used in machine learning due to its computational efficiency. By focusing on one data point at a time, significant computational resources are saved. Over multiple iterations, the hope is to converge to a point that provides a good fit for the entire dataset.
Now, let's examine a contour plot illustrating the convergence for SGD.
The contour of SGD exhibits a zigzag pattern, representing the uncertainty in decision-making at each step. Sometimes, the trajectory may move in the opposite direction of the endpoint.
Batch Gradient Descent:
Continuing with the analogy of a research analyst at a trading firm, but now with a perfectionist approach, envision surveying the entire world population for their sentiment on markets. This is akin to batch gradient descent.
In optimization using batch gradient descent, all points in the dataset are considered to determine the direction in which to adjust the parameters for the best model fit. While this method is guaranteed to converge, the challenge arises from the computational impracticality of performing calculations for all data points in each epoch, given the vast number of parameters.
领英推荐
Now, let's explore a contour plot illustrating the convergence for Batch Gradient Descent.
In the contour, we observe that, with each iteration, the trajectory consistently moves in the direction of the endpoint.
Mini-Batch Gradient Descent:
Mini-batch gradient descent combines elements of both stochastic and batch methods. Returning to the analyst at a trading firm analogy, this time, instead of surveying the entire population, a sample is taken, and assumptions are made about the population sentiment based on this sample—a reasonably practical approach.
In this optimization technique, a small number of samples from the dataset are used to make decisions on parameter tuning, reducing computational requirements compared to batch gradient descent. This method strikes a balance, moving in the right direction in each epoch and requiring fewer epochs to converge to an optimal model.
Now, let's delve into a contour plot illustrating the convergence for Mini-Batch Gradient Descent.
In the contour, we observe that the trajectory is approximately moving towards the endpoint, finding a balance between efficiency and accuracy.
Summary
IIITDWD '26
1 年Great work ??
ICT Mumbai (MarJ) | 4th Year (BTech+MTech) | Chemical Engineering | Intern at RPDS (Data Analytics) | Former Research Intern at DIAT-DRDO.
1 年Thanks for sharing. Quite insightful!