登录查看更多内容

Facor Analyzer : A Comprehensive Guide

Uttam Kumar

Data Analyst | Expertise in Data Analysis, Visualization, SQL | Transforming Data into Actionable Insights

发布日期: 2023年2月28日

Factor Analysis is used for dimensionality reduction. While before moving to factor analysis let us understand,?what is dimensionality reduction?

When we reduce the number of features or columns in the dataset without lossing the data that means even after reducing the number of columns, we have the data with us.?This is known as dimensionality reduction. We will not only talk about the theory but also know about the implementation of that.

Factor Analysis

Factor Analysis is a technique used in Machine Learning to reduce the number of features or columns of dataset. Factor Analysis aims to identify the latent variables from the dataset. It is also an example of latent variable model.

Example

Let’s understand Factor Analysis with an example :-

Suppose a company conducts a survey and ask them to rate on the basis of quality, design, price, service and easy to use. using factor analysis, company can explain the customer’s rating for the product satisfaction.

Now, the rating for quality, design and easy to use are correlated and can be a latent variable as they all are telling about product features. Similarly, ratings for service and price are correlated, as they can be a latent variable i.e., value for money.

Now, the company can find insights from those latent variables that may lead to improve the product design , pricing and services.

Manifest variables are the numerical variables or features that can be formed factors, in this case rating for the services, price and etc., are manifest variables.

Latent variables are derived from the manifest variables, in this case product features and value for money are the latent variables.

Both exploratory and confirmatory analyses of factors can be performed using factor analysis. When the underlying structure of the data is unknown and it is the objective to uncover the underlying factors, exploratory factor analysis is performed. When the underlying structure of the data is known and it is necessary to verify the validity of the factors discovered by exploratory factor analysis, confirmatory factor analysis is utilised.

Rotations

The two primary rotational types are?orthogonal?and?oblique.

The factors produced by orthogonal rotations, such as the?Varimax?and?Quartimax?rotations, are not associated with one another. When the factors are conceptually separate and do not overlap, these rotations are helpful. The most popular orthogonal rotation in factor analysis is the Varimax rotation.

The Promax and Oblimin rotations, among others, produce factors that are associated with one another. When the factors are conceptually connected and overlap, these rotations are helpful. The most popular oblique rotation in factor analysis is the Promax rotation.

By generating factors that are simpler to comprehend and have a better relationship to the original variables, rotations make it easier to grasp the results of factor analysis. This is due to the rotational factors’ high factor loadings for a condensed collection of variables, which makes it simpler to spot the data’s underlying structure.

Rotations decrease the frequency of cross-loadings, or instances where a variable heavily depends on more than one factor, which also aids in improving the fit of the factor model. Cross-loadings can make it challenging to understand the factors and may lead to a factor solution that is unstable.

Steps for Factor Analysis

The steps for factor analysis are :-

Adequacy Test
Determine the number of factors
Interpret the factors

Step 1 : Adequacy Test

It is sn importan step in factor analysis as it tells us data that we have collected suitable for factor analysis or not. Adequacy test helps us in evaluating that data meets the requirements or not for factor analysis.

There are two methods :-

Bartlett’s test of sphericity
Kaiser-Meyer-Olkin (KMO) test

Bartlett’s test of sphericity

In order to determine whether there is a relationship between the variables, Bartlett’s test of sphericity is used to test the null hypothesis that the correlation matrix is an identity matrix. Rejecting the null hypothesis means there is enough correlation between the variables to move forward with factor analysis.

Bartlett’s test of sphericity

H0 : Correlation matrix is an Identical matrix

Ze Learning Labb 3 个月前

What Is The Difference Between Parametric And…

Ze Learning Labb 3 个月前

TIQ Part 4 – Being Time intelligent

Nikola Ilic 4 年前

H1 : Correlation matrix is not Identical matrix

from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi2,p = calculate_bartlett_sphericity(data)
print("Chi squared value : ",chi2)
print("p value : ",p)
# Output
Chi squared value :  17654.270924632456
p value :  0.0

The p-value < 0.05, correlation is present among the variables with 95% confidence interval.

Kaiser-Meyer-Olkin (KMO) test

The KMO Test calculates the percentage of variance that might be shared by all the variables. Greater proportions are anticipated since they show greater connection between the variables, allowing for the use of dimensionality reduction techniques like factor analysis. KMO score ranges from 0 to 1, and values greater than 0.6 are highly valued.

from factor_analyzer.factor_analyzer import calculate_kmo
kmo_vars,kmo_model = calculate_kmo(data)
print(kmo_model)
# Output
0.963964850849081

KMO score is close to 1, thus applying factor analysis can be effective for the dataset.

Determine the number of factors

The number of factors in a factor analysis can be determined using a variety of techniques. The Kaiser-Guttman criterion is a well-liked method that recommends keeping factors with eigenvalues greater than 1.0. Factors with eigenvalues greater than 1.0 are said to explain more variance than a single variable, which is measured by eigenvalues, which show how much variance in the data each factor explains.

Examining the scree plot, which is a graphic depiction of the eigenvalues of the components, is another strategy. The scree plot exhibits a curve that rises initially and flattens down with time. By looking at the point where the curve levels off, one can identify the amount of components that need to be removed. The above-mentioned factors are kept in mind.

from factor_analyzer import FactorAnalyzer
n = data.shape[1]
fa = FactorAnalyzer(rotation = None,impute = "drop",n_factors=n)
fa.fit(data)
ev,_ = fa.get_eigenvalues()
plt.scatter(range(1,n+1),ev)
plt.plot(range(1,n+1),ev)
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigen Value')
plt.grid()

Orthogonal rotations, such as the Varimax and Quartimax rotations, produce factors that are uncorrelated with each other. These rotations are useful when the factors are conceptually distinct and do not overlap. The Varimax rotation is the most commonly used orthogonal rotation in factor analysis.

Oblique rotations, such as the Promax and Oblimin rotations, produce factors that are correlated with each other. These rotations are useful when the factors are conceptually related and overlap. The Promax rotation is the most commonly used oblique rotation in factor analysis.

Rotations simplify the interpretation of factor analysis by creating factors that are easier to understand and have a clearer relationship with the original variables. This is because the rotated factors have high factor loadings for a smaller set of variables, making it easier to identify the underlying structure in the data.

Rotations also help to improve the fit of the factor model by reducing the number of cross-loadings, or situations where a variable loads highly on more than one factor. Cross-loadings can make it difficult to interpret the factors and may result in an unstable factor solution.

Interpret the factors

We must look at the factor loadings, which show the association between each variable and each factor, in order to comprehend the factors. Factor loadings show how closely connected each variable is to each factor. A strong association between the variable and the factor is shown by a high factor loading, whereas a weak relationship is indicated by a low factor loading.

fa = FactorAnalyzer(n_factors=5,rotation='varimax')
fa.fit(data)
fa_load = pd.DataFrame(fa.loadings_,index=data.columns)
fa_load

Variance

Assumptions in Factor Analysis

There are no outliers in the data.
Sample size is supposed to be greater than the factor.
Variables must be interrelated. (Barret test)
Metric variables are expected. (interval data)
Multivariate normality not required.

That was all about of Factor Analysis. I hope you like it and able to understand. if you have any doubt , then tell in the comments.

The amount of variation in a set of data that can be accounted for by one or more factors is referred to as variance in factor analysis. Finding the underlying variables behind the observed variance in the data is the aim of factor analysis.

print(pd.DataFrame(fa.get_factor_variance(),index=['Variance',
'Proportional Var','Cumulative Var']))

The new 5 factors created can explain 82.5% variance in the data.

That’s all about the Factor Analysis, I have you like it.

Originally published at :- Medium.

Data Science Weekly

647 位关注者

Mehdi Sattari

Data Scientist @ Elevate | PhD | Data Analytics | Machine Learning | Artificial Intelligence (AI).

1 年

Nice article, but please fix the article title: Facor Analyzer? It should be FACTOR.

要查看或添加评论，请登录

查看全部

Facor Analyzer : A Comprehensive Guide

Uttam Kumar

Data Analyst | Expertise in Data Analysis, Visualization, SQL | Transforming Data into Actionable Insights

Factor Analysis

Example

Rotations

Steps for Factor Analysis

Step 1 : Adequacy Test

Bartlett’s test of sphericity

领英推荐

Kaiser-Meyer-Olkin (KMO) test

Determine the number of factors

Interpret the factors

Variance

Assumptions in Factor Analysis

Data Science Weekly

647 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Addressing Normality in Latent Profile Analysis (LPA) and Latent Class Analysis (LCA)

Extend GEV ARIs with Curve Fitting

When Linear Models Don’t Fit Your Data, Now What?

How Big of a Sample Size do you need for Factor Analysis?

Understanding the Minimum Description Length Principle: A Balance Between Model Complexity and Data Fit

Birds Of A Feather. Or Do They? - K Nearest Neighbors Validation

What techniques can be employed for outlier detection and treatment?

Comparison of Multivariate Data Using Principal Component Analysis

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

When it Makes Sense to Categorize a Continuous Predictor in a Regression Model

Factor Analysis

Example

Rotations

Steps for Factor Analysis

Step 1 : Adequacy Test

Bartlett’s test of sphericity

领英推荐

Kaiser-Meyer-Olkin (KMO) test

Determine the number of factors

Interpret the factors

Variance

Assumptions in Factor Analysis

Data Science Weekly

647 位关注者

Brewing a Future: Nestlé Harnesses AI to Cultivate Climate-Resilient Coffee Plants

2024年6月2日

From Netflix to Nachos: How Picky Algorithms Recommend Your?Snacks

2024年4月8日

Unlocking Modern Data Ecosystem

2023年5月10日

Transform Your Data into a New Space with Linear Discriminant Analysis!

2023年4月24日

One-Line EDA with Sweetviz Library

2023年4月9日

Boosting Techniques Battle: CatBoost vs XGBoost vs LightGBM vs scikit-learn GradientBoosting vs Hierarchical GB

2023年4月4日

Make Power BI Report Better than Before

2023年2月16日

Hierarchical Clustering: A Practical Introduction of Agglomerative and Divisive Methods

2023年1月6日

Top 10 Skills for Data Scientist ?| Data Scientist Required Skills

2022年12月31日

Encoding Numerical Features| Discretization| Binarization

2022年10月22日

社区洞察

其他会员也浏览了

Addressing Normality in Latent Profile Analysis (LPA) and Latent Class Analysis (LCA)

Extend GEV ARIs with Curve Fitting

When Linear Models Don’t Fit Your Data, Now What?

How Big of a Sample Size do you need for Factor Analysis?

Understanding the Minimum Description Length Principle: A Balance Between Model Complexity and Data Fit

Birds Of A Feather. Or Do They? - K Nearest Neighbors Validation

What techniques can be employed for outlier detection and treatment?

Comparison of Multivariate Data Using Principal Component Analysis

Approaches to Repeated Measures: Repeated Measures ANOVA, Marginal, and Mixed Models

When it Makes Sense to Categorize a Continuous Predictor in a Regression Model