登录查看更多内容

AI Models - Finding the best learning rate.

Kanav Gupta

.NETCore, Blazor, .NET MAUI, AI - Web and Mobile Apps.

发布日期: 2023年11月23日

If you are following my articles, I was about to write regarding the Multiclass Classifications in Deep learning. However, before moving to that an important aspect I need to discuss is while creating a model how to find the best learning rate?

In machine learning, the learning rate is a crucial hyperparameter that influences how quickly or slowly a model learns during training. Let’s break it down:

Learning Rate:

The learning rate determines the step size taken by an optimization algorithm (such as gradient descent) when updating the model’s parameters.

It controls how much the model adjusts its weights based on the loss gradient (the direction of steepest decrease in the loss function).

It's the learning rate where the loss decreases the most during the training of the model. So how we can tell our model when to stop learning or keep learning - so this need to automate too -right ?

Role of Learning Rate:

The learning rate decides how big or small each parameter update should be.

If the learning rate is too high:

The model may overshoot the optimal solution and fail to converge.
It might oscillate around the minimum without settling down.

If the learning rate is too low:

The model converges very slowly.
Training takes longer, especially for deep neural networks.

Finding the Right Balance:

Desirable Learning Rate:

Low enough for the network to converge effectively.
High enough to train within a reasonable time.

Experimentation:

Data scientists often experiment with different learning rates to find the optimal one.
Techniques like learning rate schedules, adaptive learning rates, and cyclical learning rates are used.

Let's learn the same via code as you already know by now the steps to create a model, let me repeat again if you landed here for the first time:

Set the random seed
create the model
compile the model
fit the model
Prediction

Now, lets code, please note am using the same training data of binary classification article, if you are landing here for the first time I would suggest visiting the previous article on binary classification.

Model6 we created in the previous article, code for the same below:

#lets recreate the model to evaluate to fit on training and testing data

tf.random.set_seed(42)

#create a model
model_6= tf.keras.Sequential([
     tf.keras.layers.Dense(4, activation="relu"), 
     tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

#compile the model
model_6.compile(loss="binary_crossentropy",
                optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
                metrics=["accuracy"])

#fit the model
history = model_6.fit(X_train,Y_train,epochs=25)

Let's understand it in details by plotting the loss decrease and visualize the data. As the hsitory object contains both loss and curracy. so we will be using the same to plot it and visualize to understand the data.

#plot the loss(or training) curves from the model_6's history
history.history

Output:

{'loss': [0.6962345838546753,
  0.6913262009620667,
  0.6856409311294556,
  0.6803207397460938,
  0.6736816167831421,
  0.6628592014312744,
  0.6466933488845825,
  0.6276564598083496,
  0.5859443545341492,
  0.5229468941688538,
  0.4556224048137665,
  0.3726879358291626,
  0.28534752130508423,
  0.2280178666114807,
  0.17591926455497742,
  0.14416836202144623,
  0.12153787910938263,
  0.10464778542518616,
  0.09327741712331772,
  0.08456674218177795,
  0.07561132311820984,
  0.06481277197599411,
  0.05958665907382965,
  0.05503316968679428,
  0.050970807671546936],
 'accuracy': [0.49000000953674316,
  0.5162500143051147,
  0.5299999713897705,
  0.5550000071525574,
  0.5562499761581421,
  0.5787500143051147,
  0.6537500023841858,
  0.7300000190734863,
  0.8162500262260437,
  0.9125000238418579,
  0.9549999833106995,
  0.9350000023841858,
  0.987500011920929,
  0.9925000071525574,
  0.9937499761581421,
  0.9950000047683716,
  0.9937499761581421,
  0.9962499737739563,
  0.9912499785423279,
  0.9925000071525574,
  0.9975000023841858,
  0.9987499713897705,
  0.9962499737739563,
  0.9987499713897705,
  0.9950000047683716]}

#convert the history object into data frame
pd.DataFrame(history.history)

Output:

Above, we converted the data into readable form so that we can also plot it over the graph. As we have loss and accuracy here - lets plot this and try to understand it.

#plot the loss curves
pd.DataFrame(history.history).plot()
plt.title("Model_6 loss curves")

Output:

Naveen Joshi 4 年前

Introduction to Artificial Neural Network and Machine…

Mohammad Khazab 2 年前

Future of HR from 2020: Embrace Machine Learning &…

Raul Villamarin Rodriguez 5 年前

Now, from the above image we can clearly find out - as the loss decreases -> the accuracy of the model increases . Hope its clear till this point to understand our actual question - Like until when we need to keep trianing our model and when to stop or how to declrease the learning rate during training the model.

Alright, let's move further create our new model with a Learning Rate Callback :

#set random seed
tf.random.set_seed(42)

#create a model
model_7 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
     tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

#compile the model
model_7.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics=["accuracy"])

#create a learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10 **(epoch/20))

#Fit the model
history_7 = model_7.fit(X_train,
                        Y_train,
                        epochs=100,
                        callbacks=[lr_scheduler])

In this model_7, you may have noticed the there is an extra step in between creating the model and fit the mode i.e Create a learning rate Callback .Well, that's the line where we are telling our model to find the best learning rate.

lr_scheduler = tf.keras.callbacks.LearningRateScheduler(...): This line creates an instance of a learning rate scheduler using the LearningRateScheduler callback from TensorFlow’s Keras library. Let’s dive into the details:
lr_scheduler: This variable will hold our learning rate
scheduler.tf.keras.callbacks.LearningRateScheduler(...): This function constructs the learning rate scheduler. It takes a single argument, which is a function (or lambda function) that maps the current epoch number to a learning rate value.
lambda epoch: 1e-4 * 10 ** (epoch/20): This is the function (lambda function) that defines how the learning rate changes with each epoch.

Let’s break it down further:

epoch: Represents the current epoch number during training.
1e-4: This is a small constant value (0.0001). It serves as a base learning rate.
10 ** (epoch/20): This part increases the learning rate exponentially with each epoch. Specifically, it raises 10 to the power of (epoch/20). As the epoch number increases, the learning rate grows exponentially.

In summary, this line of code creates a learning rate scheduler that adjusts the learning rate during training. The learning rate starts small (1e-4) and increases exponentially with each epoch. This approach helps fine-tune the model’s performance during training.

Output of model_7 Training:

Our training appears be normal as in previous models. Let's plot the history of model_7 as well and visualize loss/accuracy data using the history object.

#checkout the history
pd.DataFrame(history_7.history).plot(figsize=(10,7), xlabel="epochs")

Let's put the history of both the model's side by side to give a closer look:

In model_6 we provided a hard coded learning rate, however in model_7 it automatically updates the learning rate and as you can see in model_7 loss curve image - the learning rate is elevating (the green line going upwards).

Now, lets try to plot the graph between learning rate and loss.

#plot the learning rate vs loss
lrs = 1e-4 * (10 ** (tf.range(100)/20))
plt.figure(figsize=(10,7))
plt.semilogx(lrs, history_7.history[loss])
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs loss")

In the above code we took 100 different values of learning rate starting from 10 raise to power -4 and 10 times of the range number divided by 20 (20 here is number of epcohs ). On X-axis plotting Learning Rate and on Y-axis the loss.

Output:

Now, Where do you think our ideal learnig rate would be ?

So where ever the maximum loss decreases the most, right?

To figure out the ideal value for learning rate, thumb rule is - where the loss decreases the most or is still decreasing but not quite flattened out. Its usually 10 times smaller than the bottom of the curve. Our ideal rate would be some where between 10exp(-1) and 10exp(-2) .

So now if you check in model_6 for binary classifications provided us 99.5% accuracy - made without calllback method - where we passed a learning rate of 0.01 which is exactly where the above point in the graph(0.01 or 10exp(-2)) is pointing for model_7.

Examples of learning rates:

The default learning rate for the optimizer Adam is 0.01 , because based on researches their default values that are being set are the most optimized one's- though can hit and try with above example values or any other value between 0 and 1 like may 0.23, 0.25 as well but the above givene in learning rate image proved to be typical values used. So either you can use the callback method to find the best learning rate or hit and try the typical values to train the model and adjust accordingly.

Hope it clarfies the concept of learning rate.

要查看或添加评论，请登录

Kanav Gupta的更多文章

Blazor Vs ReactJS

2024年2月16日

Blazor Vs ReactJS

When it comes to modern web development, several frameworks and libraries promise to make the developer’s life easier…
The art of making automated processes.

2024年1月14日

The art of making automated processes.

The art of making processes from manual to automated is a skill that can help you save time, money, and resources. It…
Learn Everything in Tech or Depend on Just One Expertise?

2024年1月10日

Learn Everything in Tech or Depend on Just One Expertise?

I think competition in tech is inevitable and beneficial, as it drives innovation and improvement in products and…
Confusion Matrix

2023年12月22日

Confusion Matrix

A confusion matrix is a way of measuring how well a machine learning model can classify different types of data. For…
Multiclass Classification in Neural Networks with TensorFlow.

2023年12月9日

Multiclass Classification in Neural Networks with TensorFlow.

When you have more than two classes as an option, it's known as multiclass classification. Whether its three or thirty…

1 条评论
Binary Classification in Neural Networks with Tensorflow

2023年11月18日

Binary Classification in Neural Networks with Tensorflow

There are three types of classifications in Machine Learning: Binary Classification : Binary classification is a…
Model Creation with TensorFlow

2023年11月10日

Model Creation with TensorFlow

In today's article am going to share how we can create a regressioin model with TensorFlow. Easiest way to start with…
How to create a Neural Network

2023年10月30日

How to create a Neural Network

In my previous article - already discussed what are neural networks and how they work and its building blocks. Here is…
Getting Started with AI

2023年10月26日

Getting Started with AI

In basic terms, the goal of using AI is to make computers think as humans do. Writing a program with if { } Else{ }…

1 条评论
How to Avoid Exploitation as a Freelance Software Developer

2023年10月24日

How to Avoid Exploitation as a Freelance Software Developer

I have always been a software developer by heart no matter what my designation and started my career as a freelancer…

See all articles

AI Models - Finding the best learning rate.

Kanav Gupta

.NETCore, Blazor, .NET MAUI, AI - Web and Mobile Apps.

领英推荐

Kanav Gupta的更多文章

社区洞察

其他会员也浏览了

Advancing Deep Fake Detection Through Multi-Task Learning: An In-Depth Analysis

Compression of information is essential to intelligence

Unlearning the Unnecessary: Exploring Liquid AI and Distillation in Machine Forgetting

Supervised Learning: What Is It And How Does It Work?

Multi-task Learning

Generating Training Datasets Using Energy Based Models that Actually Scale

Exploring Hyperparameter Tuning in Machine Learning: Techniques, Strategies & Tools

Class 22 - INTRODUCTION & BASICS OF DEEP LEARNING Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

This Google Experiment Destroyed Some of the Assumptions of Representation Learning

Artificial Intelligence: Terms 101

领英推荐

Kanav Gupta的更多文章

Blazor Vs ReactJS

The art of making automated processes.

Learn Everything in Tech or Depend on Just One Expertise?

Confusion Matrix

Multiclass Classification in Neural Networks with TensorFlow.

Binary Classification in Neural Networks with Tensorflow

Model Creation with TensorFlow

How to create a Neural Network

Getting Started with AI

How to Avoid Exploitation as a Freelance Software Developer

社区洞察

其他会员也浏览了

Advancing Deep Fake Detection Through Multi-Task Learning: An In-Depth Analysis

Compression of information is essential to intelligence

Unlearning the Unnecessary: Exploring Liquid AI and Distillation in Machine Forgetting

Supervised Learning: What Is It And How Does It Work?

Multi-task Learning

Generating Training Datasets Using Energy Based Models that Actually Scale

Exploring Hyperparameter Tuning in Machine Learning: Techniques, Strategies & Tools

Class 22 - INTRODUCTION & BASICS OF DEEP LEARNING Notes from the AI Advance course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

This Google Experiment Destroyed Some of the Assumptions of Representation Learning

Artificial Intelligence: Terms 101