How You Can Choose Best Features For Your ML Model ?
Mohamed Abdullah ????
Muslim ???? | AI & Data Scientist @ Cyshield | Researcher | NLP Engineer | ?? AWS ML Specialty
Welcome All ??
In this topic I'll introduce some guidelines that may be useful to right choose features in ML problems.
But before talk about how choose right features , we quickly mention the importance of it in Machine Learning.
- It enables the machine learning algorithm to train faster.
- It reduces the complexity of a model and makes it easier to interpret.
- It improves the accuracy of a model if the right subset is chosen.
- It reduces overfitting
Feature Selection in ML has many method as
- Univariate Selection
- Feature Importance
- Correlation Matrix with Heatmap
1. Univariate Selection
Statistical tests can be used to select those features that have the strongest relationship with the output variable.
The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features.
from sklearn.feature_selection import SelectKBest bestfeatures = SelectKBest(score_func=chi2, k=10)
At above ?? code SelectKBest use chi-squared (chi2) statistical test to select 10 of the best features
you can use appropriate statistical test as
- Pearson Correlation
- Spearman Correlation
- Chi-Square
2. Feature Importance
Feature importance gives you a score for each feature of your data, the higher the score more important or relevant is the feature towards your output variable.
Feature importance is an inbuilt class that comes with Tree Based Classifiers
from sklearn.ensemble import ExtraTreesClassifier model = ExtraTreesClassifier() print(model.feature_importances_)
3.Correlation Matrix with Heatmap
Correlation can be positive (increase in one value of feature increases the value of the target variable) or negative (increase in one value of feature decreases the value of the target variable)
corrmat = data.corr() top_corr_features = corrmat.index plt.figure(figsize=(20,20)) g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")
Software Programmer at Jami't Al-Ba'ath
5 年is there info about feature extraction ?