HR Metrics that matter

HR Metrics that matter


Introduction

The “Job quit rate”, which measures how many workers quit their jobs, fell decisively below pre-pandemic levels last month. The number of people quitting jobs fell to 3.3 million in June from 3.4 million in the prior month. The workforce has seen the lowest level of quits since September 2020.

HR analytics, or People analytics, has emerged as a critical function within organizations. Payroll and benefits administration constitute a significant expense category, underscoring the financial importance of HR data-driven insights

In this project I will identify potential drivers of employee attrition, I have analyzed a fictitious HR dataset provided by IBM. The goal was to uncover employer attrition rates within the company


?The Dataset

The dataset used for this analysis is a fictitious yet realistic representation of employee data, crafted by IBM data scientists. It can be found here This dataset is a widely utilized resource in people analytics.


Data Analysis

I decided to use R to analyze the data, so first I needed to download R and R Studio?

We want to bring out CSV data into R as a data frame. To do that, we use the read.csv command.


This data set has 1470 rows and each row is an employee. There are 35 different columns that describe that employee.



?

The company needs to have an overview of how some of the most important demographics correlate.?To do so, I will make a correlation matrix

But first, let's learn how to filter our data so we pass in all rows, but just those columns. ?To identify potential correlations, I initially selected relevant columns including:

"Age", "DailyRate", "DistanceFromHome", "Education", "HourlyRate", "MonthlyIncome", "MonthlyRate", "NumCompaniesWorked", "TotalWorkingYears", and "TrainingTimesLastYear".

?

?

Utilizing 'R' “cor” function, Pearson correlation coefficients were calculated to assess linear relationships among variables. These coefficients, ranging from -1 to 1, quantify both the strength and direction of the correlation between each pair of variables, as displayed in the correlation matrix

Correlation matrices are essential for exploratory data analysis, providing insights into the relationships between variables.



A strong positive correlation exists between age and total working years (r=0.680), indicating increased experience with age. Older employees also tend to have higher monthly incomes (r=0.498) and they have worked in more companies (r=0.300).

Education level correlates modestly with age (r=0.208) and number of companies company worked (r=0.126), suggesting a trend towards higher education among experienced workers. Distance from home exhibited a negative correlation (-0.037) with training frequency, while pay rates showed minimal correlation with other variables


?To visualize potential relationships among monthly income, age, total working years, and education level, a scatterplot matrix was generated using the 'hr data' .This exploratory data analysis technique effectively reveals correlation patterns between these key variables



A scatterplot matrix was generated using the pairs() function to visualize relationships among key variables. As expected, a strong positive correlation emerged between age and total working years. Additionally, a moderate positive correlation was observed between age and monthly income. While a general upward trend was evident in the relationship between total working years and monthly income, particularly after the 20-year mark, significant variability existed. These findings suggest that factors beyond age and experience, such as job role, organizational level, and education, influence income and career progression.


?Recent layoffs have prompted legal action from former employees alleging discriminatory practices. They're claiming the older employees were let go at a higher rate than the younger folks To assess these claims, hypothesis testing will be employed to determine if termination decisions were equitable.


?To analyze this, we will use Boxplots to compare the age distribution of terminated and retained employees.


Initially, the box plots for age appear quite similar. However, a closer look reveals a slightly lower median age for employees who left compared to those who stayed. To determine if this difference is statistically significant and not merely due to random chance, we will conduct a hypothesis test

Lets remember that the "Null hypothesis" means there is no difference in the mean age of the two groups and "Alternative hypothesis" means there is a difference in the mean age of the two groups




We'll do the hypothesis test with what's called a Welch Two Sample t-test. Basically, we have one sample that did leave, and one sample that did stay. Let's compare the average ages of these two sample & calculate a p-value.?

?

To do this, we create a new variable called "yes_age" that is the Age column but only the rows that have attrition as "Yes".

Then create another variable called "no_age" that is the Age column, but only the rows that have "No" in attrition.?

Now, we can use the t.test function & insert these two different arrays.



Because “P” is less than 0.05, there is a statistically significant difference between the two samples, but not what was claimed initially. Those who left were younger than those who stayed. We can see that in the mean comparison at the bottom. Note that “x” is the first array we passed in and “y” is the second array. That is also confirmed in the confidence interval, since both those numbers listed below are negative, we know that the first array is smaller than the second.

?

Now we will examine what another disgruntled employee states that new employees were let go more than old employees. So this time our analysis will be based on the employee number,.

?


As the p-value exceeds the significance level of 0.05, we fail to reject the null hypothesis and conclude that there is no statistically significant difference between the two samples; so new employees are not fired more often than old employees.


Another former employee filed a lawsuit claiming the company engaged in discriminatory layoffs, targeting individuals with higher monthly incomes





For this case, ?the extremely low p-value (4.434e-13) strongly suggests that there is a statistically significant difference between the mean income of the two groups.

The negative t-statistic indicates that the mean income of the people that “leave” is lower than the mean income of the people that “stay”.

The confidence interval (-2583.050 to -1508.244) suggests that the true difference in means lies between these two values with 95% confidence.

?

Thus, the data doesn’t support the former employee’s claim that employees with higher monthly incomes were laid off at a higher rate than employees with lower monthly incomes. In fact, the hypothesis test proves the opposite.

?

Now, let’s create a linear regression model that predicts the "Monthly Income" based upon "Age"



The provided output details a simple linear regression model where Monthly Income is the dependent ?variable (what we're trying to predict) ?and Age is the independent variable ?(the predictor)

For each additional year of age, the monthly income is predicted to increase by $256.57 on average, holding other factors constant.

The R-squared value of 0.2479 suggests that age explains only 24.79% of the variation in monthly income, indicating that other factors significantly influence income. The p-value for the Age coefficient is very small (< 2e-16), indicating a strong relationship between age and monthly income.

Now, the provided output presents a linear regression model where Monthly Income is predicted based on Age and Total Working Years.




The coefficient for Age (-26.87) suggests that, on average, monthly income decreases by $26.87 for each additional year of age, holding the Total Working Years constant. This negative relationship is unexpected and might require further investigation.

The coefficient for TotalWorkingYears (489.13) indicates that, on average, monthly income increases by $489.13 for each additional year of work experience, holding Age constant.

About 59.88% of the variation in Monthly Income is explained by Age and Total working years.

Both Age and TotalWorkingYears are statistically significant predictors of Monthly Income as their p-values are less than 0.05.


Key Takeaways

?

-?????? Monthly income, Total working years, and Age are highly correlated.

-?????? Lower-paid employees experienced higher layoff rates than higher-paid employees

-?????? Age and total working years account for 60% of the variation in employee Monthly Income

-?????? Layoffs are distributed evenly between tenured and new employees

- IBM practices appear to be non-discriminatory.

?

Thank you for taking the time to review my work. Your feedback and insights are valuable. Please feel free to leave a comment below or connect with me on Linkedin for further discussion

?

Stuart Walker

Fraud Prevention Analyst @ M&G PLC | Data Analyst | Data Scientist | Python | SQL | Machine Learning | Data Analytics | Excel | Tableau | Power BI | R

7 个月

Good job Carlos ??????

回复

要查看或添加评论,请登录

Carlos Braschi的更多文章

社区洞察

其他会员也浏览了