Gradient Descent for people in a hurry !
https://images.app.goo.gl/C6u4TXGrMSvj7fWv7

Gradient Descent for people in a hurry !

Gradient Descent is one of the key algorithms that machine learning practitioners everywhere use, but not many of them can explain what it actually is. Surprising, isn't it ? This relatively simple concept is often muddled with so much mathematics and statistics that it can confuse the best of us. Here is a brief but simple explanation of the puzzling yet simple concept that does not require anything more that simple high school mathematics to understand.

A key concepts to understand :

No alt text provided for this image

COST FUNCTION : In Machine Learning, the primarily goal is to learn from the data available. This is usually done by minimising the errors i.e the machine will learn to identify the truth or the right choice from the false or the wrong choice. The mathematical representation of this function that allows the machine to make the right choice is called the Cost Function. Since this function works on the basis of identifying errors it is also called the Error function. So when we run our Machine Learning model, what we are basically trying to achieve is to identify "parameters" or "weights" that minimize the cost function. The above equation is the mathematical representation of the cost function. J - Cost Function, m - no of samples, x - input value, y - output or predicted value, h - hypothesis function that maps the values of x with y, i - ith value. The goal here is to minimize J.

No alt text provided for this image

GRADIENT DESCENT : Now we know that a machine learns by minimizing the cost function. But how does it do this ? Voila ! Enter Gradient Descent. Gradient Descent is that behind the scenes optimization algorithm that allows us achieve the minimum value, i.e the local or global minima of the function.

No alt text provided for this image

Gradient in simple words means direction.The Gradient Descent algorithm identifies the direction or the gradient the model must take to reach the minimum value. Once this direction has been identified, the algorithm computes or iterates the function with respect to a set value known as the learning rate. The learning rate is the rate at which the model must move or the distance it must cover in each iteration to reach the minima value. As shown in the graph above the distance by which the starting point moves along its journey towards the final value is known as learning rate. i.e the distance covered in each iteration. As the model iterates it moves towards the minimum value and finally converges at a point where the cost function is minimized completely and the error has been neutralised.

So Gradient Descent is that magical tool that enables us the obtain the lowest value of the cost function without much trouble. The alternative to using gradient descent would be DEATH ! This is not an exaggeration. Computing the minimum error without gradient descent would mean to iteratively run an infinite number of parameters to finally arrive at the minima or the least error. This for obvious reasons would be impractical. Gradient descent, thus constantly updates the learned value and moves the machine towards the ideal value as quickly as possible making it an "integral"( :3) aspect of machine learning problems.

Good one shashank.. It would help the article if you can include the mathematical function of Gradient Descent..

Hritik .

Oracle | Oracle Health | Data Engineer | SQL | Python | Apache Spark | Talks about data

4 年

Well explained Shashank!

Ajay Kumar K V

Werkstudent- Airbus | Masters in Mechanical engineering and Management -TUHH Hamburg

4 年

Interesting!

Satish Kumar

Student at New Horizon College of Engineering

4 年

Very well written Shashank ??

Rohit Singh

System Integration and Test Engineer || CBTC II Rail Signalling

4 年

Great explaination Shashank ??

要查看或添加评论,请登录

Shashank Ravishankar的更多文章

社区洞察

其他会员也浏览了