Exploring Gradient Descent: A Step-by-Step Implementation in R
Salmane Koraichi
Computer Science & AI </> | Building Inclusive AI & Dev Communities to Drive Innovation | Co-Founder @CoursAi
Introduction:
Gradient descent is a fundamental optimization algorithm used in machine learning and numerical optimization to find the minimum of a function. In this article, we'll delve into a hands-on implementation of gradient descent using the R programming language. Our goal is to minimize the function f(x)=1.4*(x-2)^2+3.2 through a step-by-step process, shedding light on the key parameters and decisions involved in the algorithm
Understanding the Objective:
The function f(x) serves as our optimization target, and its gradient (derivative with respect to x) is computed in the "grad" function. The algorithm starts by initializing crucial parameters, including the number of iterations, the stopping threshold, the initial value of x, and the learning rate.
f <- function(x) {
1.4 * (x-2)^2 + 3.2
}
grad <- function(x){
1.4*2*(x-2)
}
iterations <- 100
threshold <- 1e-5
stepSize <- 0.05
x <- -5
xtrace <- x
ftrace <- f(x)
Visualizing the Function:
To gain insights into the function f(x) , we visualize it over a range of x values. A plot is generated, providing a clear picture of the function's behavior.
xs <- seq(-6, 10, len = 1000)
plot(xs, f(xs), type="l", xlab="X", ylab=expression(1.4(x-2)^2 + 3.2))
Executing Gradient Descent:
The main loop of the algorithm iteratively updates the value of x based on the gradient and learning rate. During each iteration, the current x and f(x) values are stored in the vectors xtrace and ftrace. The process continues until the change in f(x) falls below the specified threshold.
领英推荐
for (iter in 1:iterations) {
x <- x - stepSize * grad(x)
xtrace <- c(xtrace, x)
ftrace <- c(ftrace, f(x))
points(xtrace, ftrace, type="b", col="red", pch=1)
if(iter > 1 && (abs(ftrace[iter] - ftrace[iter-1])) < threshold) break
}
Analyzing Results:
The tracked values of x and f(x) during the iterations are stored in a data frame, 'df'. This allows for further analysis and visualization of the optimization process.
df = data.frame(x = xtrace, f = ftrace)
Conclusion:
By implementing gradient descent in R, we gain a deeper understanding of the optimization process. The choice of learning rate, stopping criteria, and initialization parameters significantly impacts the algorithm's convergence and efficiency. Readers are encouraged to experiment with different learning rates and explore the effects on the optimization outcome. This hands-on approach provides a solid foundation for understanding and implementing gradient descent in various optimization scenarios.
The plot :
The whole code for usage :
# This R script defines two functions: `f` and `grad`.
# `f` calculates the value of the function 1.4 * (x-2)^2 + 3.2
# `grad` calculates the gradient of `f`
# Equation: Xnew = Xold - ?? ??f(Xold)
# where ?? is learning rate, and ??f(Xold) is gradient of function f with input Xold
f <- function(x) {
1.4 * (x-2)^2 + 3.2
}
grad <- function(x){
1.4*2*(x-2)
}
iterations <- 100
threshold <- 1e-5
#learning rate
stepSize <- 0.05
# initialize x
x <- -5
# initialize vectors to store x and f(x)
xtrace <- x
ftrace <- f(x)
# generate series of x values within some range
xs <- seq(-6,10,len=1000)
plot(xs , f(xs), type="l",xlab="X",ylab=expression(1.4(x-2)^2 +3.2))
for (iter in 1:iterations) {
x <- x - stepSize*grad(x)
xtrace <- c(xtrace,x)
ftrace <- c(ftrace,f(x))
points(xtrace , ftrace , type="b",col="red", pch=1)
if(iter > 1 && (abs(ftrace[iter] - ftrace[iter-1])) < threshold) break
}
df = data.frame(x=xtrace,f=ftrace)