登录查看更多内容

Dimensionality Reduction

Mayur Jain

ML Lead @ Comcast

发布日期: 2020年1月24日

Curse Of Dimensionality

It refers to a phenomena that arise when analyzing and organizing data in high-dimensional spaces (often with hundreds or thousands of dimensions) that do not occur in low-dimensional settings such as the three-dimensional physical space of everyday experience.

Dimensionality Reduction

In machine learning and statistics, dimensionality reduction or dimension reduction is the process of reducing the number of random variables under consideration by obtaining a set of principal variables.

Why Dimensionality Reduction is important ?

Data comes in formats like video, audio, images, texts etc., with huge number of features. Are all features are relevant to gain insight from ? NO, not all features are important or relevant. Based on business requirement or the redundancy in the nature of data captured, we have to reduce the feature size through Feature selection and Feature Extraction. These techniques not only reduce computation cost but it also helps in avoiding the misclassification because of highly correlated variable.

How to overcome Curse of Dimensionality ?

There are number of ways of Dimensionality reduction such as feature selection and Feature Extraction.

Principal Component Analysis
Random Projection
Independent Component Analysis
Missing Value Ratio
Low Variance Filter
Backward Feature Elimination
Forward Feature Construction
High Correlation Filter

Let’s look at the image shown above. It shows 2 dimensions X1 and X2, which are let us say measurements of a vehicle in KM (X1) and Miles (X2). Now, if you were to use both these dimensions in machine learning, they will convey similar information and introduce a lot of noise in system, so you are better off with just using one dimension. Here we have converted the dimension of data from 2D (from X1 and X2) to 1D (PC1), which has made the data relatively easier to explain.

Principal Component Analysis

Principal Components Analysis means components which are able to explain the maximum amount of variance of the features with respect to target variable, if we include all feature as components then we get the variance of 1.

PCA transforms all the interrelated variable into uncorrelated variable. Each uncorrelated variable is a Principal Component and each components is a linear combination of original variable.

Each uncorrelated variable or components holds feature information which is explained as variance. Each component with its variance add up to 1. Since each principal component is combination of original variable, some principal components explains more variance than others.

The variance explained by one principal component is uncorrelated with other principal components which means with each component we are learning or explaining a new feature. Now raises a question, how many components will be able to explain the maximum variance?. We don’t have any text book method for calculating the number of components for a given number of feature or variables.But We can maintain a variance threshold which needs to explained by the variance of the components.

Consider we have set a threshold variance of 0.8, and if have eight components with a variance as follows 0.3, 0.25, 0.15, 0.1, 0.08, 0.08, 0.07, 0.07. then we can notice 0.3 is a component with maximum variance and is called as First Principal Component. Now since the threshold is kept at 0.8, we can add up components untill it reaches a variance of 0.8.

By adding first 3 components, we have variance explained at 0.7 and by including 4th component we reach a variance of 0.8. So we can including 4 components instead of eight components, thus reducing the dimension from 8 to 4.

Random Projection

The technique is similar to PCA but the number of components is selected automatically while retaining the maximum information if not mentioned. Random Projection is a simple and computationally efficient way to reduce the dimensionality of the data by trading a controlled amount of accuracy (as additional variance) for faster processing times and smaller model sizes.

The dimensions and distribution of random projections matrices are controlled so as to preserve the pairwise distances between any two samples of the dataset. Thus random projection is a suitable approximation technique for distance based method. The projection in PCA happens to capture the spread of variance, while in Random Projection the projection capture the distance between one vs all points and efficiently reducing the dimension.

Independent Component Analysis

ICA is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals. ICA defines a generative model for the observed multivariate data, which is typically given as a large database of samples.

It is widely applied on mixed sounds for removing or segregating overlapping sound or noise.

Missing Value Ratio

In a Dataset, We have various columns and each column contains values but if data columns contains missing values then we have think about the feature selection based on Missing value ratio i.e. we can set a threshold for number of Missing value a column may contain and if we have ratio of Missing value greater than the threshold then we can drop the feature.

Higher the threshold, more aggressive the drop in features.

Low Variance Filter

It is similar to PCA Conceptually i.e. if a column carries very little information or has variance lower than a threshold value then we can drop feature i.e. variance value acts as Filter for Feature Selection.

Variance is range dependent, so normalization is required before applying this technique.

Backward Feature Elimination

In Simple terms, If a model is trained on n-input feature and error rate is calculated, then again if model is trained on n-1 feature and error rate is calculated, now if error rate is increased by small value then the feature is dropped from the dataset.

Backward feature Elimination can be performed iteratively to get better feature.

Forward Feature Construction

In this Feature Selection process, we train a model with one feature and calculate the performance measure. We keeping adding feature, one by one and calculate the performance if the performance decreases with increase in Feature, we should drop the feature and if the performance increases with increase in Feature, We iteratively add feature to the model.

High Correlation Filter

Here, If the columns present in the dataset are high correlated then the information becomes redundant and we drop these highly redundant variables from features.

We can calculate the ‘correlation coefficient’ between Numerical columns / variables.We can calculate the ‘correlation coefficient’ between Nominal columns / variables.

We can use the ‘Pearson product moment coefficient’ between Numerical columns / variables.We can use the ‘Pearson Chi squared’ value between Nominal columns / variables.

Before doing correlation operation, Perform normalization on the columns as correlation is scale sensitive.

Note

Both Forward Feature Construction and Backward Feature Elimination are computationally expensive tasks.

Implementation of Principal Component Analysis

要查看或添加评论，请登录

Mayur Jain的更多文章

Hyperparameters - Deep Learning

2020年7月21日

Hyperparameters - Deep Learning

In good old days, When we listened to radios, we had to tune the knobs in search for frequency and channel of our…

Dimensionality Reduction

Mayur Jain

ML Lead @ Comcast

Curse Of Dimensionality

Dimensionality Reduction

Why Dimensionality Reduction is important ?

How to overcome Curse of Dimensionality ?

Principal Component Analysis

Random Projection

Independent Component Analysis

Missing Value Ratio

Low Variance Filter

Backward Feature Elimination

Forward Feature Construction

High Correlation Filter

Note

Implementation of Principal Component Analysis

Mayur Jain的更多文章

社区洞察

其他会员也浏览了

Comparison of Dimensionality Reduction Methods

Confidence Interval without Bayesian Stats

Q. How to choose the best-fit among various Statistical Models ?

Techniques to reduce overfitting

Linearity studies are often not done right

Bloom Filters: Space-Efficient Probabilistic Set Membership Check

?? Dimensionality Reduction: PCA vs. T-SNE ??

The Curse of Dimensionality: When "More Data" Becomes a Nightmare.

Multiple Linear Regression in Machine Learning

Your Statistical Interaction May Not Mean What You Think!

Curse Of Dimensionality

Dimensionality Reduction

Why Dimensionality Reduction is important ?

How to overcome Curse of Dimensionality ?

Principal Component Analysis

Random Projection

Independent Component Analysis

Missing Value Ratio

Low Variance Filter

Backward Feature Elimination

Forward Feature Construction

High Correlation Filter

Note

Implementation of Principal Component Analysis

Mayur Jain的更多文章

Hyperparameters - Deep Learning

社区洞察

其他会员也浏览了

Comparison of Dimensionality Reduction Methods

Confidence Interval without Bayesian Stats

Q. How to choose the best-fit among various Statistical Models ?

Techniques to reduce overfitting

Linearity studies are often not done right

Bloom Filters: Space-Efficient Probabilistic Set Membership Check

?? Dimensionality Reduction: PCA vs. T-SNE ??

The Curse of Dimensionality: When "More Data" Becomes a Nightmare.

Multiple Linear Regression in Machine Learning

Your Statistical Interaction May Not Mean What You Think!