登录查看更多内容

Reasoning behind iteration count in machine training

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

发布日期: 2021年1月15日

Why to read?

We need iterations when the data is too big which happens all the time in machine learning and we can’t pass all the data to the computer at once.

So, to overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step to fit it to the data given.

Is it only reason to divide training data or there is something more interesting and scientific? This document helps in this regard.

Technical explanation

Let's take the two extremes, on one side each gradient descent step is using the entire dataset. You're computing the gradients for every sample. In this case you know exactly the best directly towards a local minimum. You don't waste time going the wrong direction. So in terms of numbers gradient descent steps, you'll get there in the fewest.

Of course computing the gradient over the entire dataset is expensive. So now we go to the other extreme. A batch size of just 1 sample. In this case the gradient of that sample may take you completely the wrong direction. But hey, the cost of computing the one gradient was quite trivial. As you take steps with regard to just one sample you "wander" around a bit, but on the average you head towards an equally reasonable local minimum as in full batch gradient descent.

Above mentioned both approaches have demerits which we will understand below.

Speed tradeoffs in machine learning

This info is important for understanding further readings.

Computational speed :- CPU cycles used
Speed of convergence of an algorithm :- Time taken to train machine with good prediction accuracy

Items affecting iteration count

The higher the batch size, the more memory space you'll need. You can't keep all training data in single batch and so, you need more iteration than one.
The large batch tend to problem of poorer generalisation. In other words, prediction accuracy reduces (Refer this paper)
Smaller batch results in more computational cycles and so, increases training time
When using a smaller batch size, calculation of the error can have more noise compared to larger batch size. However this noise can help the algorithm jump out of a bad local minimum and have more chance of finding either a better local minimum, or hopefully the global minimum.

Invalid choices

Batch size = training data size. In this case, single iteration is needed, but the training will be very bad
Iteration count = training sample size. In this case, too many iterations will be needed resulting in high training CPU cycles

What is right choice?

Considering above observations, optimal batch size is important. Smaller batch size is better considering convergence speed and accuracy.

In general, batch size of 32 is a good starting point, and you should also try with 64, 128, and 256. Other values (lower or higher) may be fine for some data sets, but the given range is generally the best to start experimenting with.

And, in the end, make sure the batch fits in the CPU/GPU

Point to remember

EPOCH is different from Iteration.

One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.

Reference

Thanks to these helping hands

https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks

https://arxiv.org/abs/1609.04836

https://stats.stackexchange.com/questions/164876/what-is-the-trade-off-between-batch-size-and-number-of-iterations-to-train-a-neu

https://ai.stackexchange.com/questions/8560/how-do-i-choose-the-optimal-batch-size

要查看或添加评论，请登录

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

2023年12月21日

Role of DBSCAN in machine learning

Why to read this? Density-based spatial clustering of applications with noise (DBSCAN)is a well-known data clustering…
Choice between multithreading and multi-processing: When to use what

2023年12月20日

Choice between multithreading and multi-processing: When to use what

Introduction Single threaded and single process solution is normal practice. For example, if you open the text editor…
Artificial Narrow Intelligence

2023年12月18日

Artificial Narrow Intelligence

About ANI ANI stands for "Artificial Narrow Intelligence." ANI refers to artificial intelligence systems that are…
Federated learning and Vehicular IoT

2023年11月29日

Federated learning and Vehicular IoT

Definition Federated Learning is a machine learning paradigm that trains an algorithm across multiple decentralised…
An age old proven technique for image resizing

2023年7月14日

An age old proven technique for image resizing

Why to read? Anytime, was you curious to know how you are able to zoom small resolution picture to bigger size?…

1 条评论
Stock Market Volatility Index

2023年7月12日

Stock Market Volatility Index

Why? Traders and investors use the VIX index as a tool to gauge market sentiment and assess risk levels. It can help…
The case for De-normalisation in Machine learning

2023年7月8日

The case for De-normalisation in Machine learning

Why? The need for inverse normalization arises when you want to interpret or use the normalized data in its original…

1 条评论
Kubernetes complements Meta-verse

2023年7月4日

Kubernetes complements Meta-verse

Motivation The #metaverse is a virtual world or space that exists on the #internet . It's like a big interconnected…

1 条评论
Which one offers better Security- OSS or Proprietary software

2023年6月24日

Which one offers better Security- OSS or Proprietary software

Motivation World is using so many OSS. Apache Kafka is a core part of our infrastructure at LinkedIn Redis is core part…
Why chatGPT/LLM should have unlearning capability like human has..

2023年5月29日

Why chatGPT/LLM should have unlearning capability like human has..

Executive Summary Do you know, chatGPT/LLM has this open problem to solve. This problem(unlearn) has potential to…

1 条评论

See all articles

Reasoning behind iteration count in machine training

Deepak Kumar

Propelling AI To Reinvent The Future ||Author|| 150+ Mentorship|| Leader || Innovator || Machine learning Specialist || Distributed architecture | IoT | Cloud Computing

Deepak Kumar的更多文章

社区洞察

其他会员也浏览了

On artificial intelligence and machine learning

Hello World of ANN - Implementation of ANN in Jupyter Notebook

I didn't pass LinkedIn's expertise test on Machine Learning （⊙ｏ⊙） ˉ\(°_o)/ˉ #MachineLearning #Emojis

A Beginner's Guide to Restricted Boltzmann Machines (RBMs)

Intuition in AI

Auto Encoders

Visualizing Neural Network Predictions on MNIST Dataset Using Tensorflow in Google Colab

Brief History of Machine Learning

T2T: make Deep Learning more accessible and accelerate research

Deepak Kumar的更多文章

Role of DBSCAN in machine learning

Choice between multithreading and multi-processing: When to use what

Artificial Narrow Intelligence

Federated learning and Vehicular IoT

An age old proven technique for image resizing

Stock Market Volatility Index

The case for De-normalisation in Machine learning

Kubernetes complements Meta-verse

Which one offers better Security- OSS or Proprietary software

Why chatGPT/LLM should have unlearning capability like human has..

社区洞察

其他会员也浏览了

On artificial intelligence and machine learning

Hello World of ANN - Implementation of ANN in Jupyter Notebook

I didn't pass LinkedIn's expertise test on Machine Learning （⊙ｏ⊙） ˉ\(°_o)/ˉ #MachineLearning #Emojis

A Beginner's Guide to Restricted Boltzmann Machines (RBMs)

Intuition in AI

Auto Encoders

Visualizing Neural Network Predictions on MNIST Dataset Using Tensorflow in Google Colab

Brief History of Machine Learning

T2T: make Deep Learning more accessible and accelerate research