Regularization in Machine Learning(Layman Terms Serious 1.0)!!

Regularization in Machine Learning(Layman Terms Serious 1.0)!!

Before starting further have a think on, Are you going to memorize your relation with your family mates, finding patterns to identify your Dad or brother ??

No, we don't memorize any relative(uncle or aunt) to identify them who is Mr.sharma(your mother brother) etc.

We always moderate our learning starting from our childhood.

A lot of individuals working out with Machine Learning always focusing on "ACCURACY", but "98% Accuracy" doesn't resemble your model is the right fit.

If you are working with Any ML model and you come up with 95% Accuracy, do you think it's the best model?

Let's discuss over it?!! Are you ready to learn something different today!! Be curious,Be you!!

What is the next step after working out with the ML model, it's "Deployment Stage", you have to deploy your model to your client to solve required business problem else to help your client to enhance the required business with different techniques to reach out better fit customer for more revenue generation.

At the time of "Deployment Stage", what is our main focus? can you answer this?

Let's take a look, at the time of deployment our main focus should be that, our ML model should perform efficiently on "UNSEEN DATA".And it doesn't make any sense how efficiently your ML model perform on training data if your model cannot deliver accurate results on test data.

To make your model perform well on "Test Data" as well, we use "REGULARIZATION TECHNIQUES" not to memorizing the training data but to moderate learning so that model can learn more efficiently, Is it make sense??

So, when you can conclude that "YOUR MODEL DOES MORE OF MEMORIZATION INSTEAD OF LEARNING"?

Here is the answer, when you train any ML model and it's able to perform well on training data set but gives a relatively poor result on "UNSEEN DATA" after deployment then you can easily conclude that your model is not learning instead your model is more of memorizing.

The term "REGULARIZATION" refers to certain techniques that help "Machine Learning" to learn more than just "MEMORIZING!!

Now, you people must be thinking about what is "Learning" and "Memorizing" in Machine learning, let's discuss!!

Let's say you are working with a classification problem to identify whether it is Rose Flower or Jasmine Flower and when you put you data for training it's giving you 95% accuracy on training data but on the other hand when you run the same model on test data it's giving you 84% accuracy on that data set which clearly implies that your model is memorizing instead of learning.

Let's talk about some real-time scenario, We have to predict whether an individual will switch the current service provider or not (which means when churnrate=0(customer will not switch from Airtel to Idea and if churn rate=1(custome will switch the current service provider from Airtel to Idea). And if you don't know about churn prediction do read this https://www.dhirubhai.net/pulse/data-science-your-cup-tea-vivek-chaudhary/

We have been provided with the required data set which includes Total talking hour, Senior Citizen, National calling hour, International calling hour, National SMS, International SMS to predict the churn rate to help out telecom industry to gather right kind of customers under different categories to provide best offers for better revenue generation.

Let's say you build a model and it performs well on existing data but when you try the same model with "UNSEEN DATA", it doesn't deliver a good result. And here you can conclude that your model does more of memorizing instead of Learning.

So, Why is that so happening with the above case?

One possibility should be that your "MODEL" has an overfitting problem that is why it's giving you relatively very poor performance on "UNSEEN DATA".And again you can conclude that "MEMORIZING TAKES PLACE" instead of "LEARNING".

So, do remember the golden point if your model has a significant difference with evaluation metrics for training data set and testing data set, then it clearly resembles an "OVERFITTING PROBLEM".

So, 95% Accuracy for any model doesn't resemble you have one of the best models.

This the first version and more 10 to come and be updated for more, still you didn't understand do text me over Linkedin and we can have more discussion.

Keep Learning and on a daily basis, you are going to master Regularization.

Thanks.:)








Vivek Chaudhary

Transforming MSME's on Ground | Leading CredgeSol

5 年
回复

要查看或添加评论,请登录

Vivek Chaudhary的更多文章

  • Importance Of Generalized Statistics!

    Importance Of Generalized Statistics!

    I know you might be thinking, what is this new term called Generalized Statistics ? Let me ask you a simple question to…

    2 条评论
  • AI Engineers are not genies.

    AI Engineers are not genies.

    Hi #connections thanks for your support to start this different culture while sharing the experience I had with one of…

    22 条评论
  • 20 Days Data Science Bootcamp

    20 Days Data Science Bootcamp

    We strongly believe that building Machine Learning model is not that much important instead learn how to make story &…

    16 条评论
  • Feel The Pain(ML Bootcamp)

    Feel The Pain(ML Bootcamp)

    Again we are back with one more issue that individuals are facing with Data Science domain now a days & reaching out to…

    2 条评论
  • Hear "The Unheard"

    Hear "The Unheard"

    As a human being we all get attached to the people around us in different ways but when people leave that feeling is…

    4 条评论
  • Demystifying Success!!

    Demystifying Success!!

    "I have seen kings unhappy & many shoemakers living happily"--Said by Shakespeare's. 24th of july'18 decided to…

    3 条评论
  • Project Based Mentorship Program

    Project Based Mentorship Program

    Again we are back with one more issue that individuals are facing with Data Science domain now a days & reaching out to…

    2 条评论
  • Unique Data Science Learning Path

    Unique Data Science Learning Path

    Hey, how you all are doing!! No need to get panic & confused what things this program will consists. You have to be…

    7 条评论
  • Python Web Scraping From Zero To Hero!!

    Python Web Scraping From Zero To Hero!!

    As we know Data Science is the emerging field & python is mostly used almost by 95% of the Data Scientist. What if…

    4 条评论
  • Experience Based Mentorship Program

    Experience Based Mentorship Program

    As per our research since 4 to 5 months we have been observed that there are "N' number of individual complete their…

    9 条评论

社区洞察

其他会员也浏览了