Formulation of the Linear Probability Model (LPM) to Study and Analyze the Impact of Individual Factors on Women's Participation as CEO in a Firm
Data Analysis

Formulation of the Linear Probability Model (LPM) to Study and Analyze the Impact of Individual Factors on Women's Participation as CEO in a Firm

In this paper, we present a study on women's participation in the CEO role, influenced by four variables: education, age, income, and self-confidence.

To specifically choose these variables for analyzing the model on women's participation as CEO in a company, we considered factors influencing a woman's likelihood of becoming a CEO, which can be divided into two categories: individual factors and external factors.

Individual Factors

Individual factors are those related to the woman's characteristics and experiences. Some of the most important factors include:

The inclusion of the explanatory variables education, age, income, and self-confidence in the study on women's participation in the CEO role can be econometrically explained as follows:

  • Education: Education is a critical factor for professional success, regardless of gender. Women with higher education levels are more likely to possess the skills and knowledge required to occupy leadership positions. Therefore, we expect a positive impact of education on women's participation in the CEO role.
  • Age: Age can influence experience and maturity. Older women are more likely to have more experience and maturity, which can be a positive factor for occupying leadership positions.
  • Income: Income is an indicator of financial success. Women with higher incomes have more resources to invest in their education and professional development.
  • Self-Confidence: Self-confidence is crucial for professional success. Women with higher self-confidence are more likely to apply for leadership positions and feel confident in occupying these roles.

The inclusion of these explanatory variables in the economic model allows us to examine how they influence women's participation in the CEO role. The model's results can help identify factors contributing to gender inequality in business leadership.

External Factors

External factors are related to the work environment and the company's culture. Some of the most important factors include:

  • Organizational Culture: Companies with an inclusive culture promoting gender diversity are more likely to promote women to leadership positions.
  • HR Policies and Practices: Companies with HR policies and practices supporting women in their careers, such as mentorship and networking programs, are more likely to create a more favorable environment for developing women leaders.
  • Presence of Women in Leadership Positions: The presence of women in leadership roles can serve as inspiration and motivation for other women to pursue the same goal.

For this particular paper, we will focus on individual factors, with the explanatory variable "good" being a Dummy variable, where 1 represents being self-confident and 0 represents not being self-confident. The dependent Dummy variable "female" represents whether a woman is a CEO (1 for yes, 0 for no).

Initially, we will analyze the linear regression model estimated by OLS and interpret the model's estimates, specification tests, the quality of fit, and the existence of anomalies in the model, such as multicollinearity, normality, and heteroscedasticity. Notably, since the data under study are cross-sectional, useful for studying the relationship between variables that do not change over time, such as individual characteristics or company attributes, and not time series, we will limit our analysis to the existence of the three aforementioned anomalies except for the analysis of the existence of serial autocorrelation due to the nature of the data.

Interpretative Report

The results of the linear probability regression illustrated in output (1) show that education (educ), age, and income are all individually statistically significant predictors of "female" (a dummy variable for a woman being CEO) with a p-value less than the 5% risk probability respectively by the t-test, except for the variable "good" (a dummy variable for self-confidence), which is not significant at the 5% level.

Econometric model:

female = 0.26172 + 0.0194 educ + 0.0211 age + 0.03374 good - 0.002984 income

Standard errors:

  • educ | 0.003579
  • age | 0.0078541
  • good | 0.0194
  • income | 0.0003511

t-statistics values:

  • educ | 5.44
  • age | 2.70
  • good | 1.74
  • income | -8.50

Sample size:

  • N = 3328

Coefficient of determination:

  • R^2 = 0.0257

The education coefficient is 0.0194, indicating that each additional year of education increases the probability of a woman being a CEO by 1.94 percentage points, ceteris paribus. The age coefficient is 0.0211, meaning that each additional year of age increases the probability of being a female CEO by 2.11 percentage points, ceteris paribus. The coefficient of "good" is 0.03374, indicating that being self-confident increases the probability of being a female CEO by 3.37 percentage points more compared if not self-confident. The intercept value coefficient corresponds to the average probability of a woman participating in the CEO role conditioned by education, age, income, and self-confidence, which in this case is 26.17%.

The income coefficient is negative and significant, indicating that, on average, women with higher incomes are less likely to be CEOs, i.e., increasing income by one monetary unit more will imply, on average, a reduction of approximately 0.2984% in the probability of a woman being a CEO, ceteris paribus. This could be explained by several factors, such as:

  • Women with higher incomes may be more likely to work in smaller companies or in sectors not as prone to having female CEOs.
  • Women with higher incomes may be more likely to have children, which could make balancing professional and personal life more challenging.

It's important to note that these results are based only on the data from the study in question. Other studies, with different data, might reach different conclusions about the control variables' influence on the likelihood of a woman being a CEO.

The R-squared of the regression is 0.0248, meaning the model explains 2.48% of the variation in "female," i.e., 2.48% of the sample variation of women's participation as CEO is explained by the variability of the explanatory variables, with the remainder explained by the error term. The adjusted R-squared is 0.0237, which takes into account the number of predictors in the model and penalizes the inclusion of irrelevant explanatory variables in the model, which can only be increased if including a significant explanatory variable.

The Model has a low explanatory power, i.e., the fit is not good as most of the percentage is explained by the error term, about 97.52%. The Model is statistically significant globally, with a very low root MSE, making the model more precise. The root MSE (root mean squared error) is a measure of the accuracy of a regression model. It measures the average distance between the model's predicted values and the actual values.

In the case of the presented regression model, the root MSE is 0.494 probability units. This means that, on average, the model's predicted values are 0.494 probability units away from the actual values.

A lower root MSE indicates that the model is more accurate. A root MSE of 0 means that the model is perfectly accurate, i.e., the model's predicted values are exactly the same as the actual values.

In the case of the presented regression model, the root MSE is relatively low, indicating that the model is relatively accurate. However, it's important to note that the root MSE is only a measure of accuracy, not a measure of causality.

It's possible that the root MSE is low because the model is capturing the influence of factors that are not relevant to the dependent variable. For example, the model might be capturing the influence of luck or randomness.

Interpretation of VIF Results

The VIF results illustrated in output (2) show that all predictors have VIF values less than 10, indicating no evidence of multicollinearity.

Interpretation of the Breusch-Pagan / Cook-Weisberg Test for Heteroscedasticity

The Breusch-Pagan / Cook-Weisberg test for heteroscedasticity, output (3) shows no evidence of heteroscedasticity in the model as the descriptive level, p-value is greater than the 5% risk probability, i.e., we reject the alternative hypothesis and fail to reject the null hypothesis, i.e., we accept the existence of homoscedasticity (constant variance of the error term conditioned on explanatory variables) in the study model.

Interpretation of the Shapiro-Wilk Test

According to the Shapiro-Wilk test to analyze whether variables follow a normal distribution, it is noted that according to output (4) the variable educ: The distribution of educ is not normal (p-value = 0.000). This means that the data for educ do not follow a bell-shaped curve.

age: The distribution of age is not normal (p-value = 0.000). This means that the data for age do not follow a bell-shaped curve.

income: The distribution of income is not normal (p-value = 0.000). This means that the data for income do not follow a bell-shaped curve.

good: The distribution of good is consistent with a normal distribution (p-value = 0.082). This means that the data for good can be approximated by a bell-shaped curve.

In general, the Shapiro-Wilk W test is a conservative test of normality, meaning it is more likely to reject the null hypothesis of normality even if the data are actually distributed normally. Therefore, it's important to use other methods, such as visual inspection of histograms and quantile-quantile (QQ) plots, to assess normality.

If the data are not normally distributed, this can affect the interpretation of statistical tests that assume normality, such as the t-test and ANOVA. It's also important to note that the Shapiro-Wilk W test is only a test of univariate normality, meaning it does not test for multivariate normality. In this project, we will limit ourselves to analyzing whether the variables follow or do not follow a normal distribution, and not in the correction or adjustment of the same.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了