登录查看更多内容

Role of optimiser in machine learning

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

发布日期: 2021年1月3日

Why needed?

In machine learning, learning rate decides when the training converges. If your learning rate is set too low, training will progress very slowly as you are making very tiny updates to the weights in your network. However, if your learning rate is set too high, it can cause undesirable divergent behaviour in your loss function.

So how do we find the optimal learning rate? Optimiser is the answer.

Technical explanation

One of the key hyperparameters to set in order to train a neural network is the learning rate for gradient descent.

The issue with learning rate schedules is that they all depend on hyperparameters that must be manually chosen for each given learning session and may vary greatly depending on the problem at hand or the model used. To combat this there are many different types of adaptive gradient descent algorithms such as Adagrad, Adadelta, RMSprop, Adam which are generally built into deep learning libraries such as Keras. Detailed list of optimisers are here

Adam optimiser

Adaptive Moment Estimation is most popular today.

ADAM computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients vt like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients mt, similar to momentum

Point to remember

Optimiser solves gradient descent problem(α in below picture), not the vanishing gradient problem

Reference

Time to thank these helping hands

https://www.jeremyjordan.me/nn-learning-rate/

https://en.wikipedia.org/wiki/Learning_rate

https://towardsdatascience.com/gradient-descent-algorithms-and-adaptive-learning-rate-adjustment-methods-79c701b086be

https://keras.io/api/optimizers/

要查看或添加评论，请登录

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

2023年12月21日

Role of DBSCAN in machine learning

Why to read this? Density-based spatial clustering of applications with noise (DBSCAN)is a well-known data clustering…
Choice between multithreading and multi-processing: When to use what

2023年12月20日

Choice between multithreading and multi-processing: When to use what

Introduction Single threaded and single process solution is normal practice. For example, if you open the text editor…
Artificial Narrow Intelligence

2023年12月18日

Artificial Narrow Intelligence

About ANI ANI stands for "Artificial Narrow Intelligence." ANI refers to artificial intelligence systems that are…
Federated learning and Vehicular IoT

2023年11月29日

Federated learning and Vehicular IoT

Definition Federated Learning is a machine learning paradigm that trains an algorithm across multiple decentralised…
An age old proven technique for image resizing

2023年7月14日

An age old proven technique for image resizing

Why to read? Anytime, was you curious to know how you are able to zoom small resolution picture to bigger size?…

1 条评论
Stock Market Volatility Index

2023年7月12日

Stock Market Volatility Index

Why? Traders and investors use the VIX index as a tool to gauge market sentiment and assess risk levels. It can help…
The case for De-normalisation in Machine learning

2023年7月8日

The case for De-normalisation in Machine learning

Why? The need for inverse normalization arises when you want to interpret or use the normalized data in its original…

1 条评论
Kubernetes complements Meta-verse

2023年7月4日

Kubernetes complements Meta-verse

Motivation The #metaverse is a virtual world or space that exists on the #internet . It's like a big interconnected…

1 条评论
Which one offers better Security- OSS or Proprietary software

2023年6月24日

Which one offers better Security- OSS or Proprietary software

Motivation World is using so many OSS. Apache Kafka is a core part of our infrastructure at LinkedIn Redis is core part…
Why chatGPT/LLM should have unlearning capability like human has..

2023年5月29日

Why chatGPT/LLM should have unlearning capability like human has..

Executive Summary Do you know, chatGPT/LLM has this open problem to solve. This problem(unlearn) has potential to…

1 条评论

See all articles

Role of optimiser in machine learning

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

Deepak Kumar的更多文章

社区洞察

其他会员也浏览了

BxD Primer Series: Boosting Ensemble Models

Machine Learning Guide for Petroleum Professionals: Part 3

? #ICML2024 accepted! CARTE: Pretraining and Transfer for Tabular Learning

A Hands-On Guide to Building and Training Variational Autoencoders

?? Image Classification: Supercharging Image Classification with Transfer Learning and Ensemble Models ??

Understanding Machine Learning: Key Concepts and Algorithms

Learning to distill ML models

What is the difference between FP16 and FP32 when doing deep learning?

Understanding Internal Covariate Shift ?????

Transfer Learning

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

Choice between multithreading and multi-processing: When to use what

Artificial Narrow Intelligence

Federated learning and Vehicular IoT

An age old proven technique for image resizing

Stock Market Volatility Index

The case for De-normalisation in Machine learning

Kubernetes complements Meta-verse

Which one offers better Security- OSS or Proprietary software

Why chatGPT/LLM should have unlearning capability like human has..

社区洞察

其他会员也浏览了

BxD Primer Series: Boosting Ensemble Models

Machine Learning Guide for Petroleum Professionals: Part 3

? #ICML2024 accepted! CARTE: Pretraining and Transfer for Tabular Learning

A Hands-On Guide to Building and Training Variational Autoencoders

?? Image Classification: Supercharging Image Classification with Transfer Learning and Ensemble Models ??

Understanding Machine Learning: Key Concepts and Algorithms

Learning to distill ML models

What is the difference between FP16 and FP32 when doing deep learning?

Understanding Internal Covariate Shift ?????

Transfer Learning