Factor Analysis

Factor Analysis

Factor analysis is a statistical method used to describe variability among large number of observed, correlated variables in smaller set of factors. For example, variations in some observed variables reflet variation in one unobserved variable. Factor analysis reduces a set of variables by extracting all their commonalities into a smaller number of factors. By doing it, can be called data reduction or dimension reduction.

Variables linked to the factors

The goals of factor analysis is to divide variables into groups of those that are most correlated. Within the groups, variables should be as highly correlated as possible and between the groups as low as possible.

As a part of the general linear model (GLM), the are key assumptions such as:

·?????? Linear relationships between variables

·?????? Absence of multicollinearity

·?????? Relevance of the variables

·?????? The existence of a true correlation between factors and variables

·?????? Sufficient variables for each factor

·?????? Adequate sample size

·?????? No outliers

Common Factor Analysis

This method focuses on extracting common variance among variables and not include the unique variances of all the variables.

Image Factoring

This process is based on a correlation matrix, it uses ordinary least squares regression to predict factors. ?

Maximum Likelihood Method

This method is based on a linear combination of variables, working from the correlation matrix to form factors.

Other Methods

There are other methods like: Alpha factoring, Unweighted least squares (ULS) factoring, Generalized least squares (GLS) factoring.

Key concepts in factor analysis

Factor Loadings

Factor loadings (correlation coefficient) represent the correlation between the variable and the factor. Loadings tells how much of the variance in an observed variable can be explained by the factor. A factor loading to be considered significant must capture 0.7 or higher of the variance of that variable.

Variance

Variance measures the spread of numerical values around the average. In factor analysis, the Researcher try to understand how factors influence the variance among the variables. When factor explain more variance than others, it means that is more accurately representing the variables.

?Eigenvalue

The eigenvalue represents the variance explained by a factor out of the total variance. If a factor has an eigenvalue of 1 or above, means that a factor explains more variance than a single observed variable, it helps reduce the number of variables in analysis. In contrary, means that factor account for less variability than a single variable and usually not included.

Factor score

Factor score is estimated scores of each observation that tells how strongly each variable is related to a specific factor. This value is standardized and essentially provide information about variable's placement on the factor(s).

When to use factor analysis

When dealing with large number of numbers of interconnected variables, and the Researcher wants to simplify the complexity of data by find hidden patterns. With Factor analysis, key factors can be identified and proceed to a cluster analysis with less variables (factors).

Types of factor analysis

There are two main factor analysis methods: exploratory and confirmatory.

Confirmatory factor analysis

The Researcher starts with a hypothesis about the data to confirm or disconfirm the underlying factor structures or dimensions. One of the popular methods used for that is Principal Component Analysis where analysis is run to obtain multiple possible solutions that split data among number of factors.

Exploratory factor analysis

Exploratory Factor analysis is undertaken without a hypothesis in mind. Through this process, the Researcher sees whether exist associations between the initial variables and how they are grouped.

Rotation Techniques to Enhance Interpretability

Rotations in factor analysis, help achieving a simpler and more interpretable factor structure. These consist in adjustment of the axes to maximize the distinction between factors, improving the interpretability of the results.

Challenges and solutions of factor analysis

To ensure the rights results, the researcher as to gather accurately the right set of variables. Keep in my that besides the knowledge of the product, neglecting even small details might result in wrong results.

Factor Analysis Steps

1. Determination of the Suitability of Data for Factor Analysis

Bartlett’s Test: Check if the correlation matrix is suitable for factor analysis.

Kaiser-Meyer-Olkin (KMO) Measure: Verify the sampling adequacy, 0.6 is generally considered acceptable.

2. Choose the Extraction Method

There are several methods to extract data:

·?????? Principal Component Analysis (PCA): Generally used for data reduction. PCA is a linear transformation method that identifies and explains the maximum variance present in the data.

·?????? Common Factor Analysis: Also, for data reduction, method that assumes that the total variance extracted by a factor can be categorized as common variance and unique variance

·?????? Principal Axis Factoring (PAF): Generally used when the main goal is to identify underlying factors that explain the variance in data.

·?????? Other Extraction Factors are Image factoring, maximum likelihood, alpha factoring, unweighted least squares, generalized least squares and canonical.

3. Factor Extraction

The Extraction method as to be used to identify the initial factors. Extracting eigenvalues, you can determine the number of factors to retain. Factors retained in the analysis are typically with eigenvalues greater than 1.

5. Factor Rotation

Rotate the factors using Varimax or Quartimax (Orthogonal Rotation) when assuming that the factors are uncorrelated and Promax or Oblimin (Oblique Rotation) to allow the factor to be correlated for a simpler and more interpretable factor structure.

6. Interpret and Label the Factors

Analyze the results and interpret the underlying meaning of each factor. Assign labels to each factor based on the variables with high loadings on that factor

Thanks for Sharing! ?? José Jaime Comé

回复

要查看或添加评论,请登录

José Jaime Comé的更多文章

  • Machine Learning: Predicting outcomes using Binary Logistic Regression

    Machine Learning: Predicting outcomes using Binary Logistic Regression

    Logistic regression is a statistical model that is used for binary classification by linear combination of data of one…

  • Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

    Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

    An autoregressive integrated moving average (ARIMA) is a statistical analysis model that predict values based on…

  • Comparing means of different groups (Analysis of Variance)

    Comparing means of different groups (Analysis of Variance)

    Analysis of Variance (ANOVA) is collection of statistical tests used to analyze the difference between means of more…

    2 条评论
  • Linear Discriminant Analysis

    Linear Discriminant Analysis

    Linear discriminant analysis (LDA) group data into categories, as such, this technique is used for dimensionality…

    1 条评论
  • Principal Component Analysis (PCA)

    Principal Component Analysis (PCA)

    The number of features or dimensions in a dataset can lead to issues such as overfitting, increasing computation…

    1 条评论
  • Data Governance

    Data Governance

    While Data management is part of the overall management of data. Data governance in short is just documentation…

  • Data Mining with Cluster Analysis

    Data Mining with Cluster Analysis

    The Cluster analysis is technique of statistical analysis and one of the method of data mining that consist of dividing…

社区洞察

其他会员也浏览了