AI Models - Finding the best learning rate.
If you are following my articles, I was about to write regarding the Multiclass Classifications in Deep learning. However, before moving to that an important aspect I need to discuss is while creating a model how to find the best learning rate?
In machine learning, the learning rate is a crucial hyperparameter that influences how quickly or slowly a model learns during training. Let’s break it down:
Learning Rate:
The learning rate determines the step size taken by an optimization algorithm (such as gradient descent) when updating the model’s parameters.
It controls how much the model adjusts its weights based on the loss gradient (the direction of steepest decrease in the loss function).
It's the learning rate where the loss decreases the most during the training of the model. So how we can tell our model when to stop learning or keep learning - so this need to automate too -right ?
Role of Learning Rate:
The learning rate decides how big or small each parameter update should be.
If the learning rate is too high:
If the learning rate is too low:
Finding the Right Balance:
Desirable Learning Rate:
Experimentation:
Let's learn the same via code as you already know by now the steps to create a model, let me repeat again if you landed here for the first time:
Now, lets code, please note am using the same training data of binary classification article, if you are landing here for the first time I would suggest visiting the previous article on binary classification.
Model6 we created in the previous article, code for the same below:
#lets recreate the model to evaluate to fit on training and testing data
tf.random.set_seed(42)
#create a model
model_6= tf.keras.Sequential([
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid")
])
#compile the model
model_6.compile(loss="binary_crossentropy",
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01),
metrics=["accuracy"])
#fit the model
history = model_6.fit(X_train,Y_train,epochs=25)
Let's understand it in details by plotting the loss decrease and visualize the data. As the hsitory object contains both loss and curracy. so we will be using the same to plot it and visualize to understand the data.
#plot the loss(or training) curves from the model_6's history
history.history
Output:
{'loss': [0.6962345838546753,
0.6913262009620667,
0.6856409311294556,
0.6803207397460938,
0.6736816167831421,
0.6628592014312744,
0.6466933488845825,
0.6276564598083496,
0.5859443545341492,
0.5229468941688538,
0.4556224048137665,
0.3726879358291626,
0.28534752130508423,
0.2280178666114807,
0.17591926455497742,
0.14416836202144623,
0.12153787910938263,
0.10464778542518616,
0.09327741712331772,
0.08456674218177795,
0.07561132311820984,
0.06481277197599411,
0.05958665907382965,
0.05503316968679428,
0.050970807671546936],
'accuracy': [0.49000000953674316,
0.5162500143051147,
0.5299999713897705,
0.5550000071525574,
0.5562499761581421,
0.5787500143051147,
0.6537500023841858,
0.7300000190734863,
0.8162500262260437,
0.9125000238418579,
0.9549999833106995,
0.9350000023841858,
0.987500011920929,
0.9925000071525574,
0.9937499761581421,
0.9950000047683716,
0.9937499761581421,
0.9962499737739563,
0.9912499785423279,
0.9925000071525574,
0.9975000023841858,
0.9987499713897705,
0.9962499737739563,
0.9987499713897705,
0.9950000047683716]}
#convert the history object into data frame
pd.DataFrame(history.history)
Output:
Above, we converted the data into readable form so that we can also plot it over the graph. As we have loss and accuracy here - lets plot this and try to understand it.
#plot the loss curves
pd.DataFrame(history.history).plot()
plt.title("Model_6 loss curves")
Output:
领英推荐
Now, from the above image we can clearly find out - as the loss decreases -> the accuracy of the model increases . Hope its clear till this point to understand our actual question - Like until when we need to keep trianing our model and when to stop or how to declrease the learning rate during training the model.
Alright, let's move further create our new model with a Learning Rate Callback :
#set random seed
tf.random.set_seed(42)
#create a model
model_7 = tf.keras.Sequential([
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(4, activation="relu"),
tf.keras.layers.Dense(1, activation="sigmoid")
])
#compile the model
model_7.compile(loss="binary_crossentropy",
optimizer="Adam",
metrics=["accuracy"])
#create a learning rate callback
lr_scheduler = tf.keras.callbacks.LearningRateScheduler(lambda epoch: 1e-4 * 10 **(epoch/20))
#Fit the model
history_7 = model_7.fit(X_train,
Y_train,
epochs=100,
callbacks=[lr_scheduler])
In this model_7, you may have noticed the there is an extra step in between creating the model and fit the mode i.e Create a learning rate Callback .Well, that's the line where we are telling our model to find the best learning rate.
Let’s break it down further:
In summary, this line of code creates a learning rate scheduler that adjusts the learning rate during training. The learning rate starts small (1e-4) and increases exponentially with each epoch. This approach helps fine-tune the model’s performance during training.
Output of model_7 Training:
Our training appears be normal as in previous models. Let's plot the history of model_7 as well and visualize loss/accuracy data using the history object.
#checkout the history
pd.DataFrame(history_7.history).plot(figsize=(10,7), xlabel="epochs")
Let's put the history of both the model's side by side to give a closer look:
In model_6 we provided a hard coded learning rate, however in model_7 it automatically updates the learning rate and as you can see in model_7 loss curve image - the learning rate is elevating (the green line going upwards).
Now, lets try to plot the graph between learning rate and loss.
#plot the learning rate vs loss
lrs = 1e-4 * (10 ** (tf.range(100)/20))
plt.figure(figsize=(10,7))
plt.semilogx(lrs, history_7.history[loss])
plt.xlabel("Learning Rate")
plt.ylabel("Loss")
plt.title("Learning rate vs loss")
In the above code we took 100 different values of learning rate starting from 10 raise to power -4 and 10 times of the range number divided by 20 (20 here is number of epcohs ). On X-axis plotting Learning Rate and on Y-axis the loss.
Output:
Now, Where do you think our ideal learnig rate would be ?
So where ever the maximum loss decreases the most, right?
To figure out the ideal value for learning rate, thumb rule is - where the loss decreases the most or is still decreasing but not quite flattened out. Its usually 10 times smaller than the bottom of the curve. Our ideal rate would be some where between 10exp(-1) and 10exp(-2) .
So now if you check in model_6 for binary classifications provided us 99.5% accuracy - made without calllback method - where we passed a learning rate of 0.01 which is exactly where the above point in the graph(0.01 or 10exp(-2)) is pointing for model_7.
Examples of learning rates:
The default learning rate for the optimizer Adam is 0.01 , because based on researches their default values that are being set are the most optimized one's- though can hit and try with above example values or any other value between 0 and 1 like may 0.23, 0.25 as well but the above givene in learning rate image proved to be typical values used. So either you can use the callback method to find the best learning rate or hit and try the typical values to train the model and adjust accordingly.
Hope it clarfies the concept of learning rate.