How You Can Choose Best Features For Your ML Model ?

No alt text provided for this image

Welcome All ??

In this topic I'll introduce some guidelines that may be useful to right choose features in ML problems.

But before talk about how choose right features , we quickly mention the importance of it in Machine Learning.

  • It enables the machine learning algorithm to train faster.
  • It reduces the complexity of a model and makes it easier to interpret.
  • It improves the accuracy of a model if the right subset is chosen.
  • It reduces overfitting

Feature Selection in ML has many method as

  • Univariate Selection
  • Feature Importance
  • Correlation Matrix with Heatmap

1. Univariate Selection

Statistical tests can be used to select those features that have the strongest relationship with the output variable.

The scikit-learn library provides the SelectKBest class that can be used with a suite of different statistical tests to select a specific number of features.

from sklearn.feature_selection import SelectKBest

bestfeatures = SelectKBest(score_func=chi2, k=10)

At above ?? code SelectKBest use chi-squared (chi2) statistical test to select 10 of the best features

you can use appropriate statistical test as

  • Pearson Correlation
  • Spearman Correlation
  • Chi-Square

2. Feature Importance

Feature importance gives you a score for each feature of your data, the higher the score more important or relevant is the feature towards your output variable.

Feature importance is an inbuilt class that comes with Tree Based Classifiers

from sklearn.ensemble import ExtraTreesClassifier

model = ExtraTreesClassifier()
print(model.feature_importances_)

3.Correlation Matrix with Heatmap

Correlation can be positive (increase in one value of feature increases the value of the target variable) or negative (increase in one value of feature decreases the value of the target variable)

corrmat = data.corr()
top_corr_features = corrmat.index
plt.figure(figsize=(20,20))
g=sns.heatmap(data[top_corr_features].corr(),annot=True,cmap="RdYlGn")

There's more info at following three link

Ayham Shaar

Software Programmer at Jami't Al-Ba'ath

5 年

is there info about feature extraction ?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了