Regularization in Machine Learning(Layman Terms Serious 1.0)!!
Before starting further have a think on, Are you going to memorize your relation with your family mates, finding patterns to identify your Dad or brother ??
No, we don't memorize any relative(uncle or aunt) to identify them who is Mr.sharma(your mother brother) etc.
We always moderate our learning starting from our childhood.
A lot of individuals working out with Machine Learning always focusing on "ACCURACY", but "98% Accuracy" doesn't resemble your model is the right fit.
If you are working with Any ML model and you come up with 95% Accuracy, do you think it's the best model?
Let's discuss over it?!! Are you ready to learn something different today!! Be curious,Be you!!
What is the next step after working out with the ML model, it's "Deployment Stage", you have to deploy your model to your client to solve required business problem else to help your client to enhance the required business with different techniques to reach out better fit customer for more revenue generation.
At the time of "Deployment Stage", what is our main focus? can you answer this?
Let's take a look, at the time of deployment our main focus should be that, our ML model should perform efficiently on "UNSEEN DATA".And it doesn't make any sense how efficiently your ML model perform on training data if your model cannot deliver accurate results on test data.
To make your model perform well on "Test Data" as well, we use "REGULARIZATION TECHNIQUES" not to memorizing the training data but to moderate learning so that model can learn more efficiently, Is it make sense??
So, when you can conclude that "YOUR MODEL DOES MORE OF MEMORIZATION INSTEAD OF LEARNING"?
Here is the answer, when you train any ML model and it's able to perform well on training data set but gives a relatively poor result on "UNSEEN DATA" after deployment then you can easily conclude that your model is not learning instead your model is more of memorizing.
The term "REGULARIZATION" refers to certain techniques that help "Machine Learning" to learn more than just "MEMORIZING!!
Now, you people must be thinking about what is "Learning" and "Memorizing" in Machine learning, let's discuss!!
Let's say you are working with a classification problem to identify whether it is Rose Flower or Jasmine Flower and when you put you data for training it's giving you 95% accuracy on training data but on the other hand when you run the same model on test data it's giving you 84% accuracy on that data set which clearly implies that your model is memorizing instead of learning.
Let's talk about some real-time scenario, We have to predict whether an individual will switch the current service provider or not (which means when churnrate=0(customer will not switch from Airtel to Idea and if churn rate=1(custome will switch the current service provider from Airtel to Idea). And if you don't know about churn prediction do read this https://www.dhirubhai.net/pulse/data-science-your-cup-tea-vivek-chaudhary/
We have been provided with the required data set which includes Total talking hour, Senior Citizen, National calling hour, International calling hour, National SMS, International SMS to predict the churn rate to help out telecom industry to gather right kind of customers under different categories to provide best offers for better revenue generation.
Let's say you build a model and it performs well on existing data but when you try the same model with "UNSEEN DATA", it doesn't deliver a good result. And here you can conclude that your model does more of memorizing instead of Learning.
So, Why is that so happening with the above case?
One possibility should be that your "MODEL" has an overfitting problem that is why it's giving you relatively very poor performance on "UNSEEN DATA".And again you can conclude that "MEMORIZING TAKES PLACE" instead of "LEARNING".
So, do remember the golden point if your model has a significant difference with evaluation metrics for training data set and testing data set, then it clearly resembles an "OVERFITTING PROBLEM".
So, 95% Accuracy for any model doesn't resemble you have one of the best models.
This the first version and more 10 to come and be updated for more, still you didn't understand do text me over Linkedin and we can have more discussion.
Keep Learning and on a daily basis, you are going to master Regularization.
Thanks.:)
Transforming MSME's on Ground | Leading CredgeSol
5 年Akshaya V M