Optimization in Machine Learning: A Comprehensive Guide Using R

Salmane Koraichi

Computer Science & AI </> | Building Inclusive AI & Dev Communities to Drive Innovation | Co-Founder @CoursAi

发布日期: 2024年1月20日

INTRODUCTION

Machine learning, a subset of artificial intelligence, has revolutionized the way data is analyzed and interpreted. At its core, machine learning involves training algorithms to make predictions or decisions based on data. This training process is fundamentally an optimization problem, where the goal is to minimize or maximize some function — typically a loss function or a cost function. Optimization, thus, is pivotal in machine learning, affecting the efficiency and accuracy of algorithms.

R, a programming language and environment, is renowned for its statistical capabilities and graphical tools. In the realm of machine learning, R offers robust packages and libraries, making it a preferred choice for statisticians and data scientists. Its extensive collection of packages, such as 'caret', 'nnet', and 'randomForest', facilitate various machine learning tasks, from data preprocessing to complex algorithm implementations.

This article aims to demystify the role of optimization in machine learning with a particular focus on its application using R. We will cover several key topics:

Maxima and Minima: Understanding these fundamental concepts in optimization and how they are identified using R.
Gradient Descent: An exploration of this essential optimization algorithm, including its implementation in R.
Learning Rate: Discussing its significance in the convergence of the gradient descent algorithm and how to fine-tune it.
Gradient Descent in Logistic Regression: Applying gradient descent in logistic regression models using R.
Stochastic Gradient Descent: Introducing this variant of gradient descent and demonstrating its implementation in R.
Conclusion: Summarizing the key insights and discussing future trends in optimization for machine learning.

MAXIMA AND MINIMA: UNDERSTANDING THESE FUNDAMENTAL CONCEPTS IN OPTIMIZATION AND HOW THEY ARE IDENTIFIED USING R

Optimization, at its core, is about finding the highest or lowest point of a function - known as maxima and minima. In machine learning, these points represent the most efficient solutions - be it minimizing loss or maximizing efficiency.

Theoretical Foundations: Maxima and minima can be classified into two categories: local and global. A local maximum (or minimum) is a point where the function value is higher (or lower) than all nearby points, while a global maximum (or minimum) is the highest (or lowest) point in the entire range of the function.

Mathematical Techniques: The identification of these points often involves calculus. The first derivative test helps to locate points where the gradient (or slope) of the function is zero - potential maxima or minima. The second derivative test then determines the nature of these points - whether they are indeed maxima, minima, or saddle points (neither).

Practical Implementation in R: In R, various optimization functions can be employed. One common function is optim(), which allows for locating minima (and by extension maxima) of a function. Consider a simple quadratic function f(x) = x^2 - 4x + 4. Using optim(), we can find its minimum:

optimize_function <- function(x) { return(x^2 - 4*x + 4) } result <- optim(par = c(0), fn = optimize_function)

This code initializes the search at 0 (par = c(0)) and seeks to minimize optimize_function. The output, stored in result, reveals the minimum of the function.

Challenges in Real-world Scenarios: Real-world functions are often more complex, with multiple local maxima/minima and saddle points. Identifying the global maximum or minimum in such scenarios can be challenging. Techniques like gradient descent, discussed later, are often employed to navigate these complex landscapes.

GRADIENT DESCENT: AN EXPLORATION OF THIS ESSENTIAL OPTIMIZATION ALGORITHM, INCLUDING ITS IMPLEMENTATION IN R

At the heart of machine learning's optimization lies the Gradient Descent algorithm, a cornerstone technique used to minimize the cost function, which is central to model training. Its beauty lies in its simplicity and efficiency, making it an indispensable tool in the data scientist's arsenal.

Unraveling the Mystery of Gradient Descent: Imagine standing on a mountain and needing to find the lowest valley – this is the essence of gradient descent. At each step, you assess the steepness of the hill and take a step downhill. In machine learning, this 'hill' is the cost function, and 'stepping downhill' means adjusting parameters to minimize this function.

The Mathematics Behind the Magic: The algorithm begins with initial guesses for the parameters of the model and iteratively adjusts them in the direction that reduces the cost function. This direction is determined by the negative gradient of the function at the current point. Mathematically, if our cost function is J(θ), and θ represents the parameters, the update rule is:

θnew=θold?θ??J(θold)θnew

Here, α is the learning rate, a crucial hyperparameter that dictates the size of each step.

Implementing Gradient Descent in R: To bring this concept to life, let's consider a simple linear regression problem where our cost function is the mean squared error. Implementing gradient descent in R might look like this:

gradient_descent <- function(X, y, learning_rate, n_iterations) { m <- nrow(X) theta <- runif(ncol(X)) # Random initialisation for (i in 1:n_iterations) { predictions <- X %*% theta errors <- predictions - y gradient <- t(X) %*% errors / m theta <- theta - learning_rate * gradient } return(theta) }

This function iteratively adjusts theta, the parameters of our model, moving them in the direction that reduces our cost.

Visualizing the Descent: To truly grasp the power of gradient descent, visualizing its trajectory towards the minimum can be enlightening. Imagine plotting the cost against the parameter values – each iteration moves the point on this graph closer to the bottom of the valley. This is a powerful demonstration of the algorithm converging to the minimum.

Tuning the Learning Rate: The learning rate α is a pivotal hyperparameter. Too small, and the algorithm will be painfully slow, taking eons to converge. Too large, and it risks overshooting the minimum, possibly diverging. Experimenting with different values and observing the algorithm's behavior is key. A common practice is to start with a higher learning rate and gradually decrease it.

Advanced Variations: While the basic gradient descent algorithm updates parameters using all training examples (batch gradient descent), variations exist. Stochastic gradient descent (SGD), for instance, updates parameters using a single training example at a time, offering faster iterations but more noise in the convergence path. Another variant, mini-batch gradient descent, strikes a balance between batch and stochastic, using a subset of the training data for each update.

Real-World Application and Challenges: Gradient descent is not without challenges. The presence of multiple local minima or saddle points can hinder the path to the global minimum. Furthermore, the shape of the cost function can lead to slow convergence. Techniques like feature scaling and more sophisticated algorithms like Adam or RMSprop help mitigate these issues.

LEARNING RATE: THE PIVOTAL HYPERPARAMETER IN THE GRADIENT DESCENT ALGORITHM

In the journey of optimizing machine learning models, the learning rate emerges as a pivotal hyperparameter, a guiding star in the gradient descent algorithm. It is the parameter that defines the size of steps taken towards the optimal solution and therefore holds a significant influence over the algorithm's efficiency and effectiveness.

The Essence of Learning Rate: Imagine gradient descent as a hiker descending a mountain. The learning rate determines the size of the steps the hiker takes. If the steps are too large (a high learning rate), the hiker might overshoot the valley and miss the lowest point. Conversely, if the steps are too small (a low learning rate), the hiker might take an excessively long time to reach the valley, or get stuck on a small hill thinking it's the lowest point.

Balancing Act: Finding the perfect learning rate is a balancing act. A rate that is too high can cause the algorithm to diverge, oscillating around the minimum or even moving away from it. A rate that is too low, while guaranteeing convergence, can slow down the training process significantly, leading to longer training times and potentially getting stuck in local minima.

领英推荐

ML Day 10: Effectiveness of ML Algorithms: Research…

Shanthi Kumar V - I Build AI Competencies/Practices scale up AICXOs 2 个月前

4 algorithms machine learning engineers should know

Naveen Joshi 7 年前

Top 10 Machine Learning Algorithms You Must Know in…

The Education Magazine 6 个月前

Implementing and Tuning in R: In R, when implementing gradient descent, the learning rate is a critical parameter to tune. Consider the previous gradient descent function; the learning_rate parameter controls the size of the step at each iteration. A common approach is to experiment with a range of learning rates and observe the performance:

learning_rates <- c(0.001, 0.01, 0.1, 0.5) for (lr in learning_rates) { model <- gradient_descent(X, y, lr, n_iterations) # Evaluate the model performance }

This experimentation helps in finding an optimal learning rate that leads to efficient and effective convergence.

Adaptive Learning Rates: To enhance the performance of gradient descent, adaptive learning rates can be employed. Techniques like AdaGrad, RMSprop, and Adam adjust the learning rate during training, allowing for larger steps in the beginning and smaller, more precise steps as the algorithm converges.

Visualizing the Impact: Visualizing the effect of different learning rates can provide intuitive insights. Plotting the trajectory of the cost function against iterations for different learning rates illustrates how quickly or slowly the algorithm converges.

Real-World Implications: The choice of learning rate can have significant implications in real-world scenarios. In complex models, especially deep learning networks, tuning the learning rate can be the difference between a model that converges to a meaningful solution and one that fails to learn anything at all.

Advanced Strategies: Advanced strategies involve using learning rate schedules, where the learning rate changes as training progresses. For instance, starting with a higher rate and gradually reducing it, known as learning rate annealing, can lead to more efficient convergence.

STOCHASTIC GRADIENT DESCENT: OPTIMIZING THE OPTIMIZATION IN MACHINE LEARNING

Stochastic Gradient Descent (SGD) stands as a compelling variation of the traditional gradient descent algorithm, renowned for its efficiency in handling large datasets and its effectiveness in escaping local minima.

The Core Concept of SGD: While gradient descent updates parameters using the entire dataset, SGD updates the parameters using only a single data point (or a small batch) at each iteration. This approach introduces randomness into the optimization process, which can be beneficial in several ways.

Advantages Over Traditional Gradient Descent: One of the main advantages of SGD is its faster convergence on large datasets, as it doesn't require the entire dataset to compute the gradients at each step. This makes it highly scalable and efficient, especially in scenarios where data is abundant. Furthermore, the inherent noise in SGD helps to avoid local minima, often leading to better generalization in machine learning models.

Implementing SGD in R: Implementing SGD in R requires iterating over individual data points (or small batches) and updating the model parameters accordingly. Here's a simplified example:

stochastic_gradient_descent <- function(X, y, learning_rate, n_iterations, batch_size) { m <- nrow(X) theta <- runif(ncol(X)) # Random initialisation for (i in 1:n_iterations) { indices <- sample(1:m, batch_size) X_sample <- X[indices, ] y_sample <- y[indices] predictions <- X_sample %*% theta errors <- predictions - y_sample gradient <- t(X_sample) %*% errors / batch_size theta <- theta - learning_rate * gradient } return(theta) }

In this function, a small subset of the data (batch_size) is chosen randomly in each iteration to compute the gradient and update the parameters.

Tuning and Challenges: Despite its advantages, SGD comes with its own set of challenges. The primary issue is its variability: since it uses only a small portion of the data at each step, its path towards convergence can be 'noisy' and unpredictable. This requires careful tuning of parameters, especially the learning rate and batch size. Adaptive learning rate techniques, as discussed earlier, are particularly beneficial in SGD.

SGD Variants: There are several variants of SGD that aim to combine the advantages of both batch and stochastic gradient descent. For instance, Mini-batch Gradient Descent uses batches of data larger than one but smaller than the full dataset. This strikes a balance, reducing the variance of parameter updates while still being more efficient than full-batch gradient descent.

Visualizing SGD's Path: Visualizing the path of SGD can be insightful. Unlike the smooth descent of batch gradient descent, SGD's path will appear more erratic. This visualization helps in understanding the 'noisy' steps of SGD and its convergence behavior.

SGD in the Real World: In practical applications, especially in fields like deep learning, SGD and its variants have become the de facto standard due to their scalability and efficiency. Their ability to handle massive datasets and fit complex models makes them indispensable in modern machine learning workflows.

Stochastic Gradient Descent revolutionizes the way optimization is approached in machine learning, especially in the era of big data. Its implementation in R, although straightforward, demands a deep understanding of its dynamics and a careful approach to parameter tuning. Mastering SGD paves the way for efficiently training complex and large-scale machine learning models.

CONCLUSION: EMBRACING THE FUTURE OF OPTIMIZATION IN MACHINE LEARNING

As we conclude our exploration into the world of optimization in machine learning with a focus on R, it's evident that the journey of learning and adaptation is ongoing. The techniques and algorithms we've discussed form the backbone of modern machine learning, driving the efficiency and effectiveness of predictive models.

Recapitulation of Key Points: We started by understanding the importance of maxima and minima in optimization and how to identify them using R. Then, we delved into the Gradient Descent algorithm, a fundamental tool in any machine learning practitioner's arsenal, and explored how to implement it in R. The significance of the learning rate in the convergence of the gradient descent algorithm was examined, highlighting its critical role in the optimization process. We then applied gradient descent to logistic regression, showcasing its practical application. Lastly, we discussed Stochastic Gradient Descent (SGD), an efficient variant of gradient descent, and its implementation in R.
The Importance of Continuous Learning and Adaptation: The field of machine learning is ever-evolving, and so are the techniques for optimization. What works today might be improved upon tomorrow. Hence, continuous learning and staying updated with the latest advancements is crucial for practitioners.
Emerging Trends and Future Directions: We are witnessing a surge in the use of more sophisticated optimization techniques, especially in deep learning. Algorithms like Adam and RMSprop, which build upon the concepts of SGD, are becoming more prevalent. Additionally, the integration of machine learning with other advanced technologies like quantum computing could redefine optimization methods.
The Role of R in Future Developments: R, with its strong statistical foundation and a robust package ecosystem, continues to be a significant tool in the machine learning landscape. Its community-driven nature ensures that it remains relevant, adapting and growing with the advancements in the field.
Final Thoughts: Optimization in machine learning is not just a technical requirement; it's an art that balances mathematical rigor with practical application. As we enhance our understanding and skills in these optimization techniques, we contribute to the broader narrative of technological progress, pushing the boundaries of what's possible with data.

In conclusion, the journey through optimization techniques in machine learning, particularly in the context of R, is a testament to the fascinating interplay between statistics, computer science, and domain knowledge. As we embrace these tools and techniques, we empower ourselves to build more efficient, effective, and insightful machine learning models, driving forward the frontiers of AI and data science.

code :

Pas fixe :

library(plotly)
library(ggplot2)
# Fonction à deux variables
f=function(x, y) {
  return(exp(3*y+x-0.1)+exp(-3*y+x-0.1)+exp(-x-0.1))  # Exemple : fonction quadratique
}
#return(x^2/2+7*y^2/2)#
# Gradient de la fonction à deux variables
gradient=function(x, y) {
   grad_x=exp(3*y+x-0.1)+exp(-3*y+x-0.1)-exp(-x-0.1) # Dérivée partielle par rapport à x
   grad_y=3*exp(3*y+x-0.1)-3*exp(-3*y+x-0.1) # Dérivée partielle par rapport à y
   return(c(grad_x, grad_y))
  #return(c(x,7*y))
}

# Méthode du gradient avec pas fixe pour deux variables
gradient_descent_2D=function(gradient, x_initial, y_initial, learning_rate, iterations) {
  x=x_initial
  y=y_initial
  x_values=c(x)
  y_values=c(y)
  k=0
  while(k<iterations & norm(gradient(x, y),'2')>1e-10){
    x = x - learning_rate * gradient(x, y)[1]
    y = y - learning_rate * gradient(x, y)[2]
    x_values = c(x_values, x)
    y_values = c(y_values, y)
    k=k+1
  }
  
  return(list(x_values,y_values))
}

# Valeurs initiales et paramètres de l'algorithme
x_initial = 1  # Point de départ pour x
y_initial = 1  # Point de départ pour y
learning_rate = 0.001  # Pas d'apprentissage
iterations = 10000  # Nombre d'itérations

# Appliquer la méthode du gradient pour deux variables
sol_values = gradient_descent_2D(gradient, x_initial, y_initial, learning_rate, iterations)
#print(result)  # Afficher le minimum trouvé pour x et y
# Création du graphique
x_values = sol_values[[1]]
y_values = sol_values[[2]]
cat(x_values[length(x_values)],y_values[length(y_values)],'\n')
# Création de la surface pour la fonction f
x = y=seq(-5, 5, length.out = 100)
grid = expand.grid(x = x, y = y)
z = f(grid$x, grid$y)

# Visualisation en 3D
# p <- plot_ly(z = ~matrix(z, nrow = length(x)), x = ~matrix(grid$x, nrow = length(x)), 
#              y = ~matrix(grid$y, ncol = length(y)), type ="surface") %>%
#   add_trace(data = data.frame(x = x_values, y = y_values),
#             x = ~x, y = ~y, z = ~f(x, y), type = "scatter3d", mode = "markers", marker = list(size = 4, color = "red"))
# p <- p %>% layout(scene = list(
#   xaxis = list(title = "x"),
#   yaxis = list(title = "y"),
#   zaxis = list(title = "z")
# ))
# print(p)
# Tracer les contours de f et de la contrainte g
contour_p=ggplot(grid, aes(x, y)) +
  geom_contour(aes(z = z), bins = 10) +
  geom_point(data = data.frame(x = x_values, y = y_values), aes(x, y), color = "red", size = 2) +
  labs(title = " ", x = "x", y = "y")+
  theme_minimal()

print(contour_p)

pas optimal :

library(ggplot2)
library(geometry)
library(plotly)

#
# Définition de la fonction
f=function(x,y) {
  return(x^2/2+7*y^2/2)
}

# Gradient de la fonction
grad_f=function(x,y) {
  return(c(x,7*y))
}

#Recherche du pas optimal
Gradien_pas_opt=function(x,y, c1, c2, max_iter) {
  alpha_i=1  
  x_values=c(x)
  y_values=c(y)
  i=0
  while(i <= max_iter & norm(grad_f(x,y),'2')>1e-10 ) {
    prod_i1=dot(grad_f(x,y),grad_f(x,y),d=TRUE)
    prod_i2=dot(grad_f(x - alpha_i * grad_f(x,y)[1],y - alpha_i * grad_f(x,y)[2]),grad_f(x,y),d=TRUE)
   j=0
     while((j <= max_iter) & (f(x - alpha_i * grad_f(x,y)[1],y - alpha_i * grad_f(x,y)[2]) > f(x,y) - c1 * alpha_i * prod_i1) ){
        alpha_i=0.1*alpha_i 
         # if ( -prod_i2 >= -c2 * prod_i1) {
         #   alpha_i=1.2*alpha_i 
         # }
        j=j+1
    }
    x=x - alpha_i * grad_f(x,y)[1]
    y=y - alpha_i * grad_f(x,y)[2]
    x_values=c(x_values, x)
    y_values=c(y_values, y)
    #
    i=i + 1
  }
  return(list(x_values,y_values))
}

# Point initial
x_0=5
y_0=5
# Calcul des points x_i
sol_values=Gradien_pas_opt(x_0,y_0,c1 = 1e-14, c2 = 0.1, max_iter = 1000)

# Affichage du graphique
 x_values=sol_values[[1]]
 y_values=sol_values[[2]]
 #cat(x_values[length(x_values)],y_values[length(y_values)])
 # Création de la surface pour la fonction f
 x=y=seq(-5, 5, length.out = 100)
 grid=expand.grid(x = x, y = y)
 z=f(grid$x, grid$y)
 
 # Visualisation en 3D
 p=plot_ly(z = ~matrix(z, nrow = length(x)), x = ~matrix(grid$x, nrow = length(x)), 
              y = ~matrix(grid$y, ncol = length(y)), type ="surface") %>%
   add_trace(data = data.frame(x = x_values, y = y_values),
             x = ~x, y = ~y, z = ~f(x, y), type = "scatter3d", mode = "markers", marker = list(size = 4, color = "red"))
 p=p %>% layout(scene = list(
   xaxis = list(title = "x"),
   yaxis = list(title = "y"),
   zaxis = list(title = "z")
 ))
 print(p)
 

 
 # Tracer les contours de f et de la contrainte g
 contour_p=ggplot(grid, aes(x, y)) +
   geom_contour(aes(z = z), bins = 10) +
   geom_point(data = data.frame(x = x_values, y = y_values), aes(x, y), color = "red", size = 2) +
   labs(title = " ", x = "x", y = "y")+
   theme_minimal()
 
 print(contour_p)

gradient conjugee :

library(pracma)  # Chargement de la bibliothèque nécessaire pour les opérations matricielles
# Définition de la fonction f(x, y)
f=function(x,y) {
  return(x^2 + y^2 + x*y - x - 2*y)
}
grad_f=function(x,y) {
  return(c(2*x  +y - 1, 2*y + x - 2))
}


# Forme quadratique de f(x, y)
A=matrix(c(2, 1, 1, 2), nrow = 2, byrow = TRUE)  # Matrice A
b=c(-1, -2)  # Vecteur b

# Vérification de la définie positivité de A
if(all(eigen(A)$values > 0)) {
  print("La matrice A est bien définie positive.")
  
  # Résolution du système AX = b en inversant la matrice A
  X=solve(A, b)
  print("La solution du système AX = b est :")
  print(X)
  
  # Méthode du gradient conjugué pour trouver le minimum de f(x, y)
  n=2
  X=c(-5, 5)  # Point initial
  tol=1e-10
  #fonction de R : 
  r=b-A%*%X
  d=r
  for(k in 1:n){
    if(norm(r,'2')<tol){
      break
    }
    else{
      rho=dot(r,d)/dot(A%*%d,d) 
      X=X+rho*d
      r=b-A%*%X
      alpha=-dot(A%*%r,d)/dot(A%*%d,d)
      d=r+alpha*d
    }
  }
  # Affichage
  print("Valeur approchée du point X* pour le minimum global de f(x, y) :")
  print(X)
} else {
  print("La matrice A n'est pas définie positive.")
}

my versions :

library(ggplot2)
library(reshape2)
library(rootSolve)

# Définir la fonction f
f <- function(params) {
  x <- params[1]
  y <- params[2]
  return(exp(x + 3 * y - 0.1) + exp(x - 3 * y - 0.1) + exp(-x - 0.1))
}

# Fonction pour le gradient de f
grad <- function(params) {
  x <- params[1]
  y <- params[2]
  return(c(exp(x + 3 * y - 0.1) + exp(x - 3 * y - 0.1) - exp(-x - 0.1), 
           3 * exp(x + 3 * y - 0.1) - 3 * exp(x - 3 * y - 0.1)))
}


Phi <- function(p, x, y) { 
  return(f(c(x - p * grad(c(x, y))[1], y - p * grad(c(x, y))[2])))
}

#Trouver la matrice hessian

H <- hessian(f, x = c(1, 1))
print("La matrice hessian est: ")
print(round(H,2))


#Verifier si f admet un min global:


if(all(eigen(H)$values > 0)) {
  print("La matrice Hessian de f bien définie positive.")
} else {
  print("La matrice Hessian de f n'est pas bien définie positive.")
}


#Trouver le minimin réel:


solution <- multiroot(f = grad, start = c(1, 1))
print("Le min réel est: ")
print(round(solution$root,2))


#Trouber le mimum par la methode du gradient a pas fixe:

tol = 10^-5
x <- c()
x[1] = 1
y <- c()
y[1] = 1

algoGradient <- function(p) {
  i <- 2
  while (i < 200)  {
    x[i] <- x[i - 1] - p * grad(c(x[i - 1], y[i - 1]))[1]
    y[i] <- y[i - 1] - p * grad(c(x[i - 1], y[i - 1]))[2]
    i <- i + 1
  }
  df <- data.frame(
    x = x,
    y = y
  )
  return(df)
}

View(algoGradient(0.01))


#Trouber le mimum par la methode du gradient a pas optimal:

p0 = 0.1
w1 = 1e-14
x <- c(1)
y <- c(1)
p <- c(1)
alpha = 0.5  

algoGradient2 <- function() {
  i <- 2
  while (sqrt(sum(grad(c(x[i - 1], y[i - 1])^2))) > tol) {
    j <- 2
    p[j] <-  p[j - 1]
    
    while (Phi(p[j], x[i - 1], y[i - 1]) > f(c(x[i - 1], y[i - 1])) - w1 * p[j] * norm(grad(c(x[i - 1], y[i - 1])), '2')^2) {
      j <- j + 1
      p[j] <- alpha * p[j - 1]
      
      if (j > 100) { 
        break
      }
    }
    
    x[i] <- x[i - 1] - p[j] * grad(c(x[i - 1], y[i - 1]))[1]
    y[i] <- y[i - 1] - p[j] * grad(c(x[i - 1], y[i - 1]))[2]
    i <- i + 1
    if (i > 1000) { 
      break
    }
  }
  
  df <- data.frame(x = x, y = y)
  return(df)
}

View(algoGradient2())

library(ggplot2)
library(reshape2)
library(rootSolve)

# Définir la fonction f
f <- function(params) {
  x <- params[1]
  y <- params[2]
  return(x^2 + y^2 + x*y - x - 2*y)
}


# Fonction pour le gradient de f
grad <- function(params) {
  x <- params[1]
  y <- params[2]
  return(c(2*x + y -1 , 2*y + x - 2))
}


Phi <- function(p, x, y) { 
  return(f(c(x - p * grad(c(x, y))[1], y - p * grad(c(x, y))[2])))
}

#Trouver la matrice hessian

H <- hessian(f, x = c(0, 0))
print("La matrice hessian est: ")
print(H)


#Verifier si f admet un min global:


if(all(eigen(H)$values > 0)) {
  print("La matrice Hessian de f bien définie positive.")
} else {
  print("La matrice Hessian de f n'est pas bien définie positive.")
}


#Trouver le minimin réel:


solution <- multiroot(f = grad, start = c(1, 1))
print("Le min réel est: ")
print(round(solution$root,2))


#Trouber le mimum par la methode du gradient a pas fixe:

tol = 10^-5
x <- c()
x[1] = 1
y <- c()
y[1] = 1

algoGradient <- function(p) {
  i <- 2
  while (i < 200)  {
    x[i] <- x[i - 1] - p * grad(c(x[i - 1], y[i - 1]))[1]
    y[i] <- y[i - 1] - p * grad(c(x[i - 1], y[i - 1]))[2]
    i <- i + 1
  }
  df <- data.frame(
    x = x,
    y = y
  )
  return(df)
}

View(algoGradient(0.01))


#Trouber le mimum par la methode du gradient a pas optimal:


p0 = 0.1
w1 = 1e-14
x <- c(1)
y <- c(1)
p <- c(1)
alpha = 0.56

algoGradient2 <- function() {
  i <- 2
  while (i < 200) {
    j <- 2
    p[j] <-  p[j - 1]
    
    while (Phi(p[j], x[i - 1], y[i - 1]) > f(c(x[i - 1], y[i - 1])) - w1 * p[j] * norm(grad(c(x[i - 1], y[i - 1])), '2')^2) {
      j <- j + 1
      p[j] <- alpha * p[j - 1]
      
      if (j > 100) { 
        break
      }
    }
    
    x[i] <- x[i - 1] - p[j] * grad(c(x[i - 1], y[i - 1]))[1]
    y[i] <- y[i - 1] - p[j] * grad(c(x[i - 1], y[i - 1]))[2]
    i <- i + 1
    if (i > 1000) { 
      break
    }
  }
  
  df <- data.frame(x = x, y = y)
  return(df)
}

View(algoGradient2())


#Trouber le mimum par la methode du gradient conjuguée:

A = H
b = -grad(c(0,0))

X=solve(A, b)
print("La solution du système AX = b est :")
print(X)

gradConjugue <- function() {
  maxite <- 200;
  X <- list()
  X[[1]] <- c(-5, 5)  # Premier vecteur de X (x, y)
  
  r <- list(b - A %*% X[[1]])
  d <- r
  
  for (i in 2:maxite) {
    p <- sum(r[[i-1]] * d[[i-1]]) / sum((A %*% d[[i-1]]) * d[[i-1]])
    X[[i]] <- X[[i-1]] + p * d[[i-1]]
    r[[i]] <- b - A %*% X[[i]]
    alpha <- -sum((A %% r[[i]]) * d[[i-1]]) / sum((A %% d[[i-1]]) * d[[i-1]])
    d[[i]] <- r[[i]] + alpha * d[[i-1]]
  }
  
  # Construction du dataframe ligne par ligne
  df_X <- as.data.frame(matrix(unlist(X), ncol = 2, byrow = TRUE))
  colnames(df_X) <- c("x", "y")
  
  return(df_X)
  
}

View(gradConjugue())

DataInsta

1 年

Wow, that sounds like an incredibly valuable resource! Can't wait to read it. ??

1 次回应

Reda DERRASSI

Secure API Designer I Network and Cybersecurity Engineer

1 年

Keep up the good work !

1 次回应

Data & Analytics

1 年

Congratulations on your dedication and hard work! Looking forward to reading your insightful article. ??

2 次回应

Abdelhak Ezzine

Full researcher professor at ENSA TANGER, PhD Mathematics & Computing Ecole Nationale des Ponts et Chaussées ParisTech

1 年

congratulations Salmane

1 次回应

Ridiwane MAMA TOURE

étudiant en Cyber Sécurité à L'Ecole Nationale des Sciences Appliquées

1 年

Pertinent ????

1 次回应

查看更多评论

要查看或添加评论，请登录

Salmane Koraichi的更多文章

Should Computer Science Be a High School Graduation Requirement?

2024年8月20日

Should Computer Science Be a High School Graduation Requirement?

In today's world, technology is everywhere. From the smartphones in our pockets to the apps we use daily, technology is…
Exploring Linear Regression with PyTorch

2024年1月31日

Exploring Linear Regression with PyTorch

In this comprehensive guide, we delve into the intricacies of linear regression and its implementation using PyTorch, a…
Exploring Gradient Descent: A Step-by-Step Implementation in R

2023年12月27日

Exploring Gradient Descent: A Step-by-Step Implementation in R

Introduction: Gradient descent is a fundamental optimization algorithm used in machine learning and numerical…
Harmonizing Data Symphony: Choosing Between R and SQL for Your Programming Overture

2023年11月25日

Harmonizing Data Symphony: Choosing Between R and SQL for Your Programming Overture

Embarking on the journey of acquiring a new language, be it spoken or programming, can be an intimidating endeavor. The…
Leveraging Analog In-Memory Computing for Enhanced Deep Learning

2023年11月5日

Leveraging Analog In-Memory Computing for Enhanced Deep Learning

Introduction Deep learning, the driving force behind artificial intelligence (AI) advancements, often faces…
Decoding the Visual World: Unveiling the Wonders of Computer Vision

2023年10月26日

Decoding the Visual World: Unveiling the Wonders of Computer Vision

Computer vision, one of the most compelling domains of artificial intelligence, has quietly permeated our daily lives…
Object-Oriented Programming in C++

2023年10月19日

Object-Oriented Programming in C++

Introduction Object-Oriented Programming (OOP) is a powerful paradigm that simplifies the software development process…
Artificial Intelligence for Cybersecurity

2023年10月6日

Artificial Intelligence for Cybersecurity

Introduction In today's interconnected digital world, the importance of cybersecurity cannot be overstated. With the…

1 条评论
Understanding Data Structures in C: A Comprehensive Guide

2023年9月21日

Understanding Data Structures in C: A Comprehensive Guide

Data structures are fundamental components of computer science and programming, enabling efficient storage and…
Delving deeper : the Fascinating World of Online Machine Learning

2023年9月1日

Delving deeper : the Fascinating World of Online Machine Learning

In the rapidly evolving landscape of artificial intelligence and machine learning, one fascinating concept that stands…

See all articles

Optimization in Machine Learning: A Comprehensive Guide Using R

Salmane Koraichi

Computer Science & AI </> | Building Inclusive AI & Dev Communities to Drive Innovation | Co-Founder @CoursAi

INTRODUCTION

MAXIMA AND MINIMA: UNDERSTANDING THESE FUNDAMENTAL CONCEPTS IN OPTIMIZATION AND HOW THEY ARE IDENTIFIED USING R

GRADIENT DESCENT: AN EXPLORATION OF THIS ESSENTIAL OPTIMIZATION ALGORITHM, INCLUDING ITS IMPLEMENTATION IN R

LEARNING RATE: THE PIVOTAL HYPERPARAMETER IN THE GRADIENT DESCENT ALGORITHM

领英推荐

STOCHASTIC GRADIENT DESCENT: OPTIMIZING THE OPTIMIZATION IN MACHINE LEARNING

code :

Salmane Koraichi的更多文章

社区洞察

其他会员也浏览了

CRISP-DM Process for Machine Learning Projects

Building a Machine Learning Pipeline

Understanding XGBoost: A Powerful Machine Learning Algorithm

Machine Learning Across Industries: Transforming the Future with Intelligent Algorithms

REGRESSION TECHNIQUES IN MACHINE LEARNING

Unleashing the Power of Machine Learning Algorithms: A Comprehensive Guide

10 Things Everyone Needs to Know about Machine Learning in 2019

10 Machine Learning Algorithms Explained Using Real-World Analogies

Machine Learning Algorithms: A Deep Dive into Key Techniques

Linear Regression for Machine Learning

INTRODUCTION

MAXIMA AND MINIMA: UNDERSTANDING THESE FUNDAMENTAL CONCEPTS IN OPTIMIZATION AND HOW THEY ARE IDENTIFIED USING R

GRADIENT DESCENT: AN EXPLORATION OF THIS ESSENTIAL OPTIMIZATION ALGORITHM, INCLUDING ITS IMPLEMENTATION IN R

LEARNING RATE: THE PIVOTAL HYPERPARAMETER IN THE GRADIENT DESCENT ALGORITHM

领英推荐

STOCHASTIC GRADIENT DESCENT: OPTIMIZING THE OPTIMIZATION IN MACHINE LEARNING

code :

Salmane Koraichi的更多文章

Should Computer Science Be a High School Graduation Requirement?

Exploring Linear Regression with PyTorch

Exploring Gradient Descent: A Step-by-Step Implementation in R

Harmonizing Data Symphony: Choosing Between R and SQL for Your Programming Overture

Leveraging Analog In-Memory Computing for Enhanced Deep Learning

Decoding the Visual World: Unveiling the Wonders of Computer Vision

Object-Oriented Programming in C++

Artificial Intelligence for Cybersecurity

Understanding Data Structures in C: A Comprehensive Guide

Delving deeper : the Fascinating World of Online Machine Learning

社区洞察

其他会员也浏览了

CRISP-DM Process for Machine Learning Projects

Building a Machine Learning Pipeline

Understanding XGBoost: A Powerful Machine Learning Algorithm

Machine Learning Across Industries: Transforming the Future with Intelligent Algorithms

REGRESSION TECHNIQUES IN MACHINE LEARNING

Unleashing the Power of Machine Learning Algorithms: A Comprehensive Guide

10 Things Everyone Needs to Know about Machine Learning in 2019

10 Machine Learning Algorithms Explained Using Real-World Analogies

Machine Learning Algorithms: A Deep Dive into Key Techniques

Linear Regression for Machine Learning