Reasoning behind iteration count in machine training
https://images.app.goo.gl/Tz9vmEybu2mKidcS8

Reasoning behind iteration count in machine training

Why to read?

We need iterations when the data is too big which happens all the time in machine learning and we can’t pass all the data to the computer at once.

So, to overcome this problem we need to divide the data into smaller sizes and give it to our computer one by one and update the weights of the neural networks at the end of every step to fit it to the data given.

Is it only reason to divide training data or there is something more interesting and scientific? This document helps in this regard.

Technical explanation

Let's take the two extremes, on one side each gradient descent step is using the entire dataset. You're computing the gradients for every sample. In this case you know exactly the best directly towards a local minimum. You don't waste time going the wrong direction. So in terms of numbers gradient descent steps, you'll get there in the fewest.

Of course computing the gradient over the entire dataset is expensive. So now we go to the other extreme. A batch size of just 1 sample. In this case the gradient of that sample may take you completely the wrong direction. But hey, the cost of computing the one gradient was quite trivial. As you take steps with regard to just one sample you "wander" around a bit, but on the average you head towards an equally reasonable local minimum as in full batch gradient descent.

Above mentioned both approaches have demerits which we will understand below.

Speed tradeoffs in machine learning

This info is important for understanding further readings.

  • Computational speed :- CPU cycles used
  • Speed of convergence of an algorithm :- Time taken to train machine with good prediction accuracy
Items affecting iteration count
  • The higher the batch size, the more memory space you'll need. You can't keep all training data in single batch and so, you need more iteration than one.
  • The large batch tend to problem of poorer generalisation. In other words, prediction accuracy reduces (Refer this paper)
  • Smaller batch results in more computational cycles and so, increases training time
  • When using a smaller batch size, calculation of the error can have more noise compared to larger batch size. However this noise can help the algorithm jump out of a bad local minimum and have more chance of finding either a better local minimum, or hopefully the global minimum.
Invalid choices
  • Batch size = training data size. In this case, single iteration is needed, but the training will be very bad
  • Iteration count = training sample size. In this case, too many iterations will be needed resulting in high training CPU cycles
What is right choice?

Considering above observations, optimal batch size is important. Smaller batch size is better considering convergence speed and accuracy.

In general, batch size of 32 is a good starting point, and you should also try with 64, 128, and 256. Other values (lower or higher) may be fine for some data sets, but the given range is generally the best to start experimenting with. 

And, in the end, make sure the batch fits in the CPU/GPU

Point to remember

EPOCH is different from Iteration.

One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network only ONCE.


Reference
Thanks to these helping hands
https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks

https://arxiv.org/abs/1609.04836

https://stats.stackexchange.com/questions/164876/what-is-the-trade-off-between-batch-size-and-number-of-iterations-to-train-a-neu

https://ai.stackexchange.com/questions/8560/how-do-i-choose-the-optimal-batch-size

要查看或添加评论,请登录

Deepak Kumar的更多文章

  • Role of DBSCAN in machine learning

    Role of DBSCAN in machine learning

    Why to read this? Density-based spatial clustering of applications with noise (DBSCAN)is a well-known data clustering…

  • Choice between multithreading and multi-processing: When to use what

    Choice between multithreading and multi-processing: When to use what

    Introduction Single threaded and single process solution is normal practice. For example, if you open the text editor…

  • Artificial Narrow Intelligence

    Artificial Narrow Intelligence

    About ANI ANI stands for "Artificial Narrow Intelligence." ANI refers to artificial intelligence systems that are…

  • Federated learning and Vehicular IoT

    Federated learning and Vehicular IoT

    Definition Federated Learning is a machine learning paradigm that trains an algorithm across multiple decentralised…

  • An age old proven technique for image resizing

    An age old proven technique for image resizing

    Why to read? Anytime, was you curious to know how you are able to zoom small resolution picture to bigger size?…

    1 条评论
  • Stock Market Volatility Index

    Stock Market Volatility Index

    Why? Traders and investors use the VIX index as a tool to gauge market sentiment and assess risk levels. It can help…

  • The case for De-normalisation in Machine learning

    The case for De-normalisation in Machine learning

    Why? The need for inverse normalization arises when you want to interpret or use the normalized data in its original…

    1 条评论
  • Kubernetes complements Meta-verse

    Kubernetes complements Meta-verse

    Motivation The #metaverse is a virtual world or space that exists on the #internet . It's like a big interconnected…

    1 条评论
  • Which one offers better Security- OSS or Proprietary software

    Which one offers better Security- OSS or Proprietary software

    Motivation World is using so many OSS. Apache Kafka is a core part of our infrastructure at LinkedIn Redis is core part…

  • Why chatGPT/LLM should have unlearning capability like human has..

    Why chatGPT/LLM should have unlearning capability like human has..

    Executive Summary Do you know, chatGPT/LLM has this open problem to solve. This problem(unlearn) has potential to…

    1 条评论

社区洞察

其他会员也浏览了