登录查看更多内容

Feature Selection In Machine Learning Version 1.0('Layman words') !!

Vivek Chaudhary

Transforming MSME's on Ground | Leading CredgeSol

发布日期: 2019年12月7日

+ 关注

What is Feature Selection?

Let's understand in 'Layman Words' !!

Yesterday while talking to an individual & asked her out for Tea & Sutta. She answered me, you know what I have my friend who is in Bangalore & not going out with him as well.

Here is my answer to her?

You know him already but I am an unknown variable which can be the best selection for your model, let's explore the features & observe the correlation and all things can be taken forward based on under-fitting or over-fitting!! (she is like you are 'Mad' as you can relate everything to Data Science & reading all this concept I am fed up.;))

What we can conclude from the above conversation?

Data Science is not new at all, we all are using somehow by relating our life scenario. First try to relate your surroundings with Data science, the time you understand the stuff that is the day you move ahead.

Before going forward as always there will be some newbies, so let's discuss the difference between predictors & target variable.

Let me ask you on which factor it depends how much your father is going to spend every month, here your dad salary is the target variable or dependent variable & amount spent monthly will depend on your household expenditure, your tuition fee, your pocket money and may others factor as well which keep on changing according to needs, so all these factors are known to be as predictors or independent variable.

Machine Learning works on the simple rule - if you think in a more completed way, it will be more confusing to start with else if you put garbage(wet & dry both) in your dustbin, then it will smell like hell which can be bad for health kept for a week but if you classify dry & wet separately then it will be more convenient even if kept for a week. In the end, we can conclude - if you put garbage in, then you will get only garbage to come out(garbage means noise in data).

In machine learning, feature selection is the process to choose a variable that makes sense to your business that will be useful in predicting the target variable(Y). Before proceeding randomly without knowing your problem statement it will make you lead to no-where, So it is considered the good practice to understand which features are important while building predictive models.

When we deal with real-world datasets commonly you will find columns that are nothing but noise.

For sure just because of such variables they will be occupying more of space, time & the computational resources it is going to cost, especially with large datasets.

Always it's hard to understand when we have a variable that makes sense to the business but we are not pretty much sure whether it will help in predicting target variable (Y) or not. Another important & crucial point that every individual should be aware of while dealing with the problem is that if a feature that could be useful in one ML algorithm (say a random forest) may not be that much effective with another ( like a decision tree).

Most of the time it is possible if the variable doesn't make any sense to explain the response variable (Y), which can be more useful if combined with other predictors. In other words, a variable must have a low correlation value with Y but in the presence of other variables, it can help to explain some other patterns or unexpectedly relation which in turn can be more useful to proceed with, that other variables can't explain at all.

In most cases, it is almost hard to decide whether to include or exclude such variables.

We are going to discuss strategies that can help you out to fix such problems & most importantly you can analyze which particular variable is important or not & how much it is contributing to your required model.

Note:: It is always best to select the variables that relate to business logic but hard to find the correlation & for sure you will find such cases which are not correlated but if combined with other predictors can give you meaningful result for your target variable Y.

Advantage of Feature Selection !!

1.)Training of the machine learning model will be faster.

2.) The complexity of your model will be reduced & it will be easier to interpret.

3.) Accuracy of your model will be improved automatically if you select the correct features or combine with other predictors which can give you meaningful results for your target variable Y.

4.) Important but not the last if you make your hand dirty on feature selection then it will reduce the over-fitting.And it takes a ton of practice to get expertise with.

Keep on reading & from tomorrow we will be making our hand dirty with coding to understand 'feature selection' deeply.

Happy Learning & Keep Supporting!!!

要查看或添加评论，请登录

Vivek Chaudhary的更多文章

Importance Of Generalized Statistics!

2022年8月31日

Importance Of Generalized Statistics!

I know you might be thinking, what is this new term called Generalized Statistics ? Let me ask you a simple question to…

2 条评论
AI Engineers are not genies.

2022年5月23日

AI Engineers are not genies.

Hi #connections thanks for your support to start this different culture while sharing the experience I had with one of…

22 条评论
20 Days Data Science Bootcamp

2020年8月25日

20 Days Data Science Bootcamp

We strongly believe that building Machine Learning model is not that much important instead learn how to make story &…

16 条评论
Feel The Pain(ML Bootcamp)

2020年8月15日

Feel The Pain(ML Bootcamp)

Again we are back with one more issue that individuals are facing with Data Science domain now a days & reaching out to…

2 条评论
Hear "The Unheard"

2020年8月9日

Hear "The Unheard"

As a human being we all get attached to the people around us in different ways but when people leave that feeling is…

4 条评论
Demystifying Success!!

2020年7月28日

Demystifying Success!!

"I have seen kings unhappy & many shoemakers living happily"--Said by Shakespeare's. 24th of july'18 decided to…

3 条评论
Project Based Mentorship Program

2020年7月24日

Project Based Mentorship Program

Again we are back with one more issue that individuals are facing with Data Science domain now a days & reaching out to…

2 条评论
Unique Data Science Learning Path

2020年7月3日

Unique Data Science Learning Path

Hey, how you all are doing!! No need to get panic & confused what things this program will consists. You have to be…

7 条评论
Python Web Scraping From Zero To Hero!!

2020年6月29日

Python Web Scraping From Zero To Hero!!

As we know Data Science is the emerging field & python is mostly used almost by 95% of the Data Scientist. What if…

4 条评论
Experience Based Mentorship Program

2020年6月9日

Experience Based Mentorship Program

As per our research since 4 to 5 months we have been observed that there are "N' number of individual complete their…

9 条评论

See all articles

Feature Selection In Machine Learning Version 1.0('Layman words') !!

Vivek Chaudhary

Transforming MSME's on Ground | Leading CredgeSol

Vivek Chaudhary的更多文章

社区洞察

其他会员也浏览了

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Unlocking Model Performance: Navigating the Key Factors for Success in Machine Learning

Overcoming the Curse of Dimensionality: Techniques and Strategies

k-Nearest Neighbors (k-NN) in a Nutshell

What is explainability of ML and what we can do?

The Decision Tree Algorithm: A Simple Guide to How It Works

Explain Different Types of Kernel in SVM (Support Vector Machine)

Data Transformations in Machine Learning |2 - Part 10

Feature Transformation Techniques

Power Transformations In Machine Learning

Vivek Chaudhary的更多文章

Importance Of Generalized Statistics!

AI Engineers are not genies.

20 Days Data Science Bootcamp

Feel The Pain(ML Bootcamp)

Hear "The Unheard"

Demystifying Success!!

Project Based Mentorship Program

Unique Data Science Learning Path

Python Web Scraping From Zero To Hero!!

Experience Based Mentorship Program

社区洞察

其他会员也浏览了

Understanding the Bias-Variance Tradeoff: Balancing Model Performance in Machine Learning

Unlocking Model Performance: Navigating the Key Factors for Success in Machine Learning

Overcoming the Curse of Dimensionality: Techniques and Strategies

k-Nearest Neighbors (k-NN) in a Nutshell

What is explainability of ML and what we can do?

The Decision Tree Algorithm: A Simple Guide to How It Works

Explain Different Types of Kernel in SVM (Support Vector Machine)

Data Transformations in Machine Learning |2 - Part 10

Feature Transformation Techniques

Power Transformations In Machine Learning