AI Models - Finding the best learning rate.

AI Models - Finding the best learning rate.

If you are following my articles, I was about to write regarding the Multiclass Classifications in Deep learning. However, before moving to that an important aspect I need to discuss is while creating a model how to find the best learning rate?

In machine learning, the learning rate is a crucial hyperparameter that influences how quickly or slowly a model learns during training. Let’s break it down:


Learning Rate:

The learning rate determines the step size taken by an optimization algorithm (such as gradient descent) when updating the model’s parameters.

It controls how much the model adjusts its weights based on the loss gradient (the direction of steepest decrease in the loss function).

It's the learning rate where the loss decreases the most during the training of the model. So how we can tell our model when to stop learning or keep learning - so this need to automate too -right ?

Role of Learning Rate:

The learning rate decides how big or small each parameter update should be.

If the learning rate is too high:

  • The model may overshoot the optimal solution and fail to converge.
  • It might oscillate around the minimum without settling down.

If the learning rate is too low:

  • The model converges very slowly.
  • Training takes longer, especially for deep neural networks.

Finding the Right Balance:

Desirable Learning Rate:

  • Low enough for the network to converge effectively.
  • High enough to train within a reasonable time.

Experimentation:

  • Data scientists often experiment with different learning rates to find the optimal one.
  • Techniques like learning rate schedules, adaptive learning rates, and cyclical learning rates are used.

Let's learn the same via code as you already know by now the steps to create a model, let me repeat again if you landed here for the first time:

  1. Set the random seed
  2. create the model
  3. compile the model
  4. fit the model
  5. Prediction

Now, lets code, please note am using the same training data of binary classification article, if you are landing here for the first time I would suggest visiting the previous article on binary classification.

Model6 we created in the previous article, code for the same below:

#lets recreate the model to evaluate to fit on training and testing data

tf.random.set_seed(42)

#create a model
model_6= tf.keras.Sequential([
     tf.keras.layers.Dense(4, activation="relu"), 
     tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

#compile the model
model_6.compile(loss="binary_crossentropy",
                optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
                metrics=["accuracy"])

#fit the model
history = model_6.fit(X_train,Y_train,epochs=25)        

Let's understand it in details by plotting the loss decrease and visualize the data. As the hsitory object contains both loss and curracy. so we will be using the same to plot it and visualize to understand the data.

#plot the loss(or training) curves from the model_6's history
history.history        

Output:

{'loss': [0.6962345838546753,
  0.6913262009620667,
  0.6856409311294556,
  0.6803207397460938,
  0.6736816167831421,
  0.6628592014312744,
  0.6466933488845825,
  0.6276564598083496,
  0.5859443545341492,
  0.5229468941688538,
  0.4556224048137665,
  0.3726879358291626,
  0.28534752130508423,
  0.2280178666114807,
  0.17591926455497742,
  0.14416836202144623,
  0.12153787910938263,
  0.10464778542518616,
  0.09327741712331772,
  0.08456674218177795,
  0.07561132311820984,
  0.06481277197599411,
  0.05958665907382965,
  0.05503316968679428,
  0.050970807671546936],
 'accuracy': [0.49000000953674316,
  0.5162500143051147,
  0.5299999713897705,
  0.5550000071525574,
  0.5562499761581421,
  0.5787500143051147,
  0.6537500023841858,
  0.7300000190734863,
  0.8162500262260437,
  0.9125000238418579,
  0.9549999833106995,
  0.9350000023841858,
  0.987500011920929,
  0.9925000071525574,
  0.9937499761581421,
  0.9950000047683716,
  0.9937499761581421,
  0.9962499737739563,
  0.9912499785423279,
  0.9925000071525574,
  0.9975000023841858,
  0.9987499713897705,
  0.9962499737739563,
  0.9987499713897705,
  0.9950000047683716]}        
#convert the history object into data frame
pd.DataFrame(history.history)        

Output:

Data Frame of History Object (Model_6)


Above, we converted the data into readable form so that we can also plot it over the graph. As we have loss and accuracy here - lets plot this and try to understand it.

#plot the loss curves
pd.DataFrame(history.history).plot()
plt.title("Model_6 loss curves")        

Output:

Loss Curves

Now, from the above image we can clearly find out - as the loss decreases -> the accuracy of the model increases . Hope its clear till this point to understand our actual question - Like until when we need to keep trianing our model and when to stop or how to declrease the learning rate during training the model.

Alright, let's move further create our new model with a Learning Rate Callback :

#set random seed
tf.random.set_seed(42)

#create a model
model_7 = tf.keras.Sequential([
    tf.keras.layers.Dense(4, activation="relu"),
     tf.keras.layers.Dense(4, activation="relu"),
    tf.keras.layers.Dense(1, activation="sigmoid")
])

#compile the model
model_7.compile(loss="binary_crossentropy",
                optimizer="Adam",
                metrics=["accuracy"])

#create a learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10 **(epoch/20))

#Fit the model
history_7 = model_7.fit(X_train,
                        Y_train,
                        epochs=100,
                        callbacks=[lr_scheduler])        

In this model_7, you may have noticed the there is an extra step in between creating the model and fit the mode i.e Create a learning rate Callback .Well, that's the line where we are telling our model to find the best learning rate.


  • lr_scheduler = tf.keras.callbacks.LearningRateScheduler(...): This line creates an instance of a learning rate scheduler using the LearningRateScheduler callback from TensorFlow’s Keras library. Let’s dive into the details:
  • lr_scheduler: This variable will hold our learning rate
  • scheduler.tf.keras.callbacks.LearningRateScheduler(...): This function constructs the learning rate scheduler. It takes a single argument, which is a function (or lambda function) that maps the current epoch number to a learning rate value.
  • lambda epoch: 1e-4 * 10 ** (epoch/20): This is the function (lambda function) that defines how the learning rate changes with each epoch.

Let’s break it down further:

  • epoch: Represents the current epoch number during training.
  • 1e-4: This is a small constant value (0.0001). It serves as a base learning rate.
  • 10 ** (epoch/20): This part increases the learning rate exponentially with each epoch. Specifically, it raises 10 to the power of (epoch/20). As the epoch number increases, the learning rate grows exponentially.

In summary, this line of code creates a learning rate scheduler that adjusts the learning rate during training. The learning rate starts small (1e-4) and increases exponentially with each epoch. This approach helps fine-tune the model’s performance during training.

Output of model_7 Training:

model_7 training output

Our training appears be normal as in previous models. Let's plot the history of model_7 as well and visualize loss/accuracy data using the history object.

#checkout the history
pd.DataFrame(history_7.history).plot(figsize=(10,7), xlabel="epochs")        
model_7 history

Let's put the history of both the model's side by side to give a closer look:

In model_6 we provided a hard coded learning rate, however in model_7 it automatically updates the learning rate and as you can see in model_7 loss curve image - the learning rate is elevating (the green line going upwards).

Now, lets try to plot the graph between learning rate and loss.

#plot the learning rate vs loss
lrs = 1e-4 * (10 ** (tf.range(100)/20))
plt.figure(figsize=(10,7))
plt.semilogx(lrs, history_7.history[loss])
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs loss")        

In the above code we took 100 different values of learning rate starting from 10 raise to power -4 and 10 times of the range number divided by 20 (20 here is number of epcohs ). On X-axis plotting Learning Rate and on Y-axis the loss.

Output:


Learning rate vs loss


Now, Where do you think our ideal learnig rate would be ?

So where ever the maximum loss decreases the most, right?

To figure out the ideal value for learning rate, thumb rule is - where the loss decreases the most or is still decreasing but not quite flattened out. Its usually 10 times smaller than the bottom of the curve. Our ideal rate would be some where between 10exp(-1) and 10exp(-2) .

So now if you check in model_6 for binary classifications provided us 99.5% accuracy - made without calllback method - where we passed a learning rate of 0.01 which is exactly where the above point in the graph(0.01 or 10exp(-2)) is pointing for model_7.

Examples of learning rates:

learnin rate values

The default learning rate for the optimizer Adam is 0.01 , because based on researches their default values that are being set are the most optimized one's- though can hit and try with above example values or any other value between 0 and 1 like may 0.23, 0.25 as well but the above givene in learning rate image proved to be typical values used. So either you can use the callback method to find the best learning rate or hit and try the typical values to train the model and adjust accordingly.

Hope it clarfies the concept of learning rate.

要查看或添加评论,请登录

Kanav Gupta的更多文章

  • Blazor Vs ReactJS

    Blazor Vs ReactJS

    When it comes to modern web development, several frameworks and libraries promise to make the developer’s life easier…

  • The art of making automated processes.

    The art of making automated processes.

    The art of making processes from manual to automated is a skill that can help you save time, money, and resources. It…

  • Learn Everything in Tech or Depend on Just One Expertise?

    Learn Everything in Tech or Depend on Just One Expertise?

    I think competition in tech is inevitable and beneficial, as it drives innovation and improvement in products and…

  • Confusion Matrix

    Confusion Matrix

    A confusion matrix is a way of measuring how well a machine learning model can classify different types of data. For…

  • Multiclass Classification in Neural Networks with TensorFlow.

    Multiclass Classification in Neural Networks with TensorFlow.

    When you have more than two classes as an option, it's known as multiclass classification. Whether its three or thirty…

    1 条评论
  • Binary Classification in Neural Networks with Tensorflow

    Binary Classification in Neural Networks with Tensorflow

    There are three types of classifications in Machine Learning: Binary Classification : Binary classification is a…

  • Model Creation with TensorFlow

    Model Creation with TensorFlow

    In today's article am going to share how we can create a regressioin model with TensorFlow. Easiest way to start with…

  • How to create a Neural Network

    How to create a Neural Network

    In my previous article - already discussed what are neural networks and how they work and its building blocks. Here is…

  • Getting Started with AI

    Getting Started with AI

    In basic terms, the goal of using AI is to make computers think as humans do. Writing a program with if { } Else{ }…

    1 条评论
  • How to Avoid Exploitation as a Freelance Software Developer

    How to Avoid Exploitation as a Freelance Software Developer

    I have always been a software developer by heart no matter what my designation and started my career as a freelancer…

社区洞察

其他会员也浏览了