Cost function and Gradient Descent
Nimra Iman
Aspiring Data Scientist | Passionate about ML and AI | Python Developer | Statistical Data Analyst
In the world of machine learning, understanding the concepts of cost function and gradient descent is essential for developing accurate predictive models. These two components work hand in hand to optimize our models and enhance their predictive power. Let's delve into what they are and how they work.
A cost function is a measure of how well our model's predictions match the actual data. Errors, which are actually differences between actual and predicted value, are calculated through cost function. The greater the error or we can say, the greater the value of the cost function, the lesser the accuracy of the model.?
Depending on the type of machine learning problem, different cost functions are used such as:
MEAN SQUARED ERROR:
It calculates the average of the squared differences between the predicted and actual values. Squaring the errors ensures that negative values do not cancel out positive values.
?
Our main goal is to minimize the error or to minimize the value of that cost function and the technique for that is Gradient Descent.
Gradient descent is an optimization algorithm used to find the values of parameters (coefficients) that minimize the cost function. The predicted line is calculated from y=mx+c and the goal is to calculate the best value of “m” and “c” to make a best-fit line where we start making a line from the value of “c” and then proceed using the value of “m” following RISE/RUN concept.
Steps Involved in Gradient Descent:
But the question that arises here is: how to update the parameters?
Well, there are two approaches here. One is fixed and the other is Variable.
In the fixed approach, we update values of m and c in fixed amounts but there are chances of losing the best value of “m” and “c” because MSE is minimum only for some specific amounts of these parameters otherwise it again starts increasing on increasing the values of “m” and “c” from that specific value. Consider the following graph to understand in the best way:
领英推荐
Consider the y-axis as MSE and the x-axis as m and c. At point A, we have a maximum value of MSE due to randomly assumed values of parameters but slowly changing the parameters, we got changes in MSE value. In the variable approach, we change parameter values on the basis of some calculations and that formula is shown below:
LR is the learning rate and its value is mostly 0.001
PD(m) is the partial derivative of m
PD(C) is the partial derivative of c
We will use different Iterations until we reach point B means until we reach the lowest Value of MSE. But iterations are also made carefully because if we take a lot of iterations, the end of the last iteration may give us the greater value of MSE as compared to the value present at point B.
Here is the practical implementation of it:
for 100 iterations, you will see the output like this:
In summary, understanding cost functions and gradient descent is crucial for optimizing machine learning models. The cost function measures the error between predicted and actual values, while gradient descent is an iterative method used to minimize this error by adjusting model parameters. Mastery of these concepts allows for the development of accurate and efficient predictive models, forming a foundational skill set for any aspiring data scientist or machine learning practitioner.