Facor Analyzer : A Comprehensive Guide
Uttam Kumar
Data Analyst | Expertise in Data Analysis, Visualization, SQL | Transforming Data into Actionable Insights
Factor Analysis is used for dimensionality reduction. While before moving to factor analysis let us understand,?what is dimensionality reduction?
When we reduce the number of features or columns in the dataset without lossing the data that means even after reducing the number of columns, we have the data with us.?This is known as dimensionality reduction. We will not only talk about the theory but also know about the implementation of that.
Factor Analysis
Factor Analysis is a technique used in Machine Learning to reduce the number of features or columns of dataset. Factor Analysis aims to identify the latent variables from the dataset. It is also an example of latent variable model.
Example
Let’s understand Factor Analysis with an example :-
Suppose a company conducts a survey and ask them to rate on the basis of quality, design, price, service and easy to use. using factor analysis, company can explain the customer’s rating for the product satisfaction.
Now, the rating for quality, design and easy to use are correlated and can be a latent variable as they all are telling about product features. Similarly, ratings for service and price are correlated, as they can be a latent variable i.e., value for money.
Now, the company can find insights from those latent variables that may lead to improve the product design , pricing and services.
Manifest variables are the numerical variables or features that can be formed factors, in this case rating for the services, price and etc., are manifest variables.
Latent variables are derived from the manifest variables, in this case product features and value for money are the latent variables.
Both exploratory and confirmatory analyses of factors can be performed using factor analysis. When the underlying structure of the data is unknown and it is the objective to uncover the underlying factors, exploratory factor analysis is performed. When the underlying structure of the data is known and it is necessary to verify the validity of the factors discovered by exploratory factor analysis, confirmatory factor analysis is utilised.
Rotations
The two primary rotational types are?orthogonal?and?oblique.
The factors produced by orthogonal rotations, such as the?Varimax?and?Quartimax?rotations, are not associated with one another. When the factors are conceptually separate and do not overlap, these rotations are helpful. The most popular orthogonal rotation in factor analysis is the Varimax rotation.
The Promax and Oblimin rotations, among others, produce factors that are associated with one another. When the factors are conceptually connected and overlap, these rotations are helpful. The most popular oblique rotation in factor analysis is the Promax rotation.
By generating factors that are simpler to comprehend and have a better relationship to the original variables, rotations make it easier to grasp the results of factor analysis. This is due to the rotational factors’ high factor loadings for a condensed collection of variables, which makes it simpler to spot the data’s underlying structure.
Rotations decrease the frequency of cross-loadings, or instances where a variable heavily depends on more than one factor, which also aids in improving the fit of the factor model. Cross-loadings can make it challenging to understand the factors and may lead to a factor solution that is unstable.
Steps for Factor Analysis
The steps for factor analysis are :-
Step 1 : Adequacy Test
It is sn importan step in factor analysis as it tells us data that we have collected suitable for factor analysis or not. Adequacy test helps us in evaluating that data meets the requirements or not for factor analysis.
There are two methods :-
Bartlett’s test of sphericity
In order to determine whether there is a relationship between the variables, Bartlett’s test of sphericity is used to test the null hypothesis that the correlation matrix is an identity matrix. Rejecting the null hypothesis means there is enough correlation between the variables to move forward with factor analysis.
Bartlett’s test of sphericity
H0 : Correlation matrix is an Identical matrix
领英推荐
H1 : Correlation matrix is not Identical matrix
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
chi2,p = calculate_bartlett_sphericity(data)
print("Chi squared value : ",chi2)
print("p value : ",p)
# Output
Chi squared value : 17654.270924632456
p value : 0.0
The p-value < 0.05, correlation is present among the variables with 95% confidence interval.
Kaiser-Meyer-Olkin (KMO) test
The KMO Test calculates the percentage of variance that might be shared by all the variables. Greater proportions are anticipated since they show greater connection between the variables, allowing for the use of dimensionality reduction techniques like factor analysis. KMO score ranges from 0 to 1, and values greater than 0.6 are highly valued.
from factor_analyzer.factor_analyzer import calculate_kmo
kmo_vars,kmo_model = calculate_kmo(data)
print(kmo_model)
# Output
0.963964850849081
KMO score is close to 1, thus applying factor analysis can be effective for the dataset.
Determine the number of factors
The number of factors in a factor analysis can be determined using a variety of techniques. The Kaiser-Guttman criterion is a well-liked method that recommends keeping factors with eigenvalues greater than 1.0. Factors with eigenvalues greater than 1.0 are said to explain more variance than a single variable, which is measured by eigenvalues, which show how much variance in the data each factor explains.
Examining the scree plot, which is a graphic depiction of the eigenvalues of the components, is another strategy. The scree plot exhibits a curve that rises initially and flattens down with time. By looking at the point where the curve levels off, one can identify the amount of components that need to be removed. The above-mentioned factors are kept in mind.
from factor_analyzer import FactorAnalyzer
n = data.shape[1]
fa = FactorAnalyzer(rotation = None,impute = "drop",n_factors=n)
fa.fit(data)
ev,_ = fa.get_eigenvalues()
plt.scatter(range(1,n+1),ev)
plt.plot(range(1,n+1),ev)
plt.title('Scree Plot')
plt.xlabel('Factors')
plt.ylabel('Eigen Value')
plt.grid()
Orthogonal rotations, such as the Varimax and Quartimax rotations, produce factors that are uncorrelated with each other. These rotations are useful when the factors are conceptually distinct and do not overlap. The Varimax rotation is the most commonly used orthogonal rotation in factor analysis.
Oblique rotations, such as the Promax and Oblimin rotations, produce factors that are correlated with each other. These rotations are useful when the factors are conceptually related and overlap. The Promax rotation is the most commonly used oblique rotation in factor analysis.
Rotations simplify the interpretation of factor analysis by creating factors that are easier to understand and have a clearer relationship with the original variables. This is because the rotated factors have high factor loadings for a smaller set of variables, making it easier to identify the underlying structure in the data.
Rotations also help to improve the fit of the factor model by reducing the number of cross-loadings, or situations where a variable loads highly on more than one factor. Cross-loadings can make it difficult to interpret the factors and may result in an unstable factor solution.
Interpret the factors
We must look at the factor loadings, which show the association between each variable and each factor, in order to comprehend the factors. Factor loadings show how closely connected each variable is to each factor. A strong association between the variable and the factor is shown by a high factor loading, whereas a weak relationship is indicated by a low factor loading.
fa = FactorAnalyzer(n_factors=5,rotation='varimax')
fa.fit(data)
fa_load = pd.DataFrame(fa.loadings_,index=data.columns)
fa_load
Variance
Assumptions in Factor Analysis
That was all about of Factor Analysis. I hope you like it and able to understand. if you have any doubt , then tell in the comments.
The amount of variation in a set of data that can be accounted for by one or more factors is referred to as variance in factor analysis. Finding the underlying variables behind the observed variance in the data is the aim of factor analysis.
print(pd.DataFrame(fa.get_factor_variance(),index=['Variance',
'Proportional Var','Cumulative Var']))
The new 5 factors created can explain 82.5% variance in the data.
That’s all about the Factor Analysis, I have you like it.
Originally published at :- Medium.
Data Scientist @ Elevate | PhD | Data Analytics | Machine Learning | Artificial Intelligence (AI).
1 年Nice article, but please fix the article title: Facor Analyzer? It should be FACTOR.