Hired or Fired?

Hired or Fired?

The job market has changed a lot through the years. While traditionally, employees would spend most of their careers in one or two companies, that’s doesn't seem to be the case anymore. Job hopping has become a very common practice among professionals, specially young ones. Eager to find better opportunities or compensations, employees are willing to change companies very fast, with people having multiple jobs within a year.

This represents a challenge for companies that want to retain professionals, because hiring the right person is not an easy task. HR managers are faced with questions like: What makes someone stay on the job? What makes them leave? Is it a money factor? Does the company culture play an important role?

In this case, we’ll explore an HR dataset to answer these questions using the coding language R.


The Data Set:

This data set was created by IBM data scientists. It contains 35 different columns and 1470 rows, each belonging to one distinct employee. The columns are mostly oriented to demographic data, such as marital status, age, education, etc. There are also other columns, like overall time in current position, last promotion date or ‘Attrition’, which describes whether the employee stayed at the company or not.


These are our most important data set columns:?

  • Age: Employees age
  • Attrition: Describes whether the employee stayed at the company or not.
  • Business Travel: How often does the employee travel.
  • Daily Rate: Money the employee earns by day.
  • Department: Where in the company the employee works.
  • Distance From Home: How far from the work does the employee live.
  • Education: Level of employee’s education, from 1 to 4.
  • Hourly Rate: Money the employee earns per hour.
  • Monthly Income: Money the employee earns per month.
  • Monthly Rate: Money the employee earns per month.
  • Number Companies Worked: The amount of companies the employee has worked for.
  • Total Working Years: Amount of years an employee has been working.
  • Training Times Last Year: Amount of times an employee was trained the previous year.


Questions:

  1. Which demographics correlate the most?
  2. Does age play a role in employees staying and leaving the company?
  3. Are older employees paid more or less than younger employees?


1. Demographics Correlation

To figure out why some employees stay longer than other at the company we can run a correlation matrix with our most important columns. This can be easily done in R, by selecting the columns and adding the following correlation function:

Once we run our code, we obtain this table as a result. It shows how each column correlates with each other. The closer value is to 1, the more likely a correlation exists.

The most noticeable correlations are:

  • Age-TotalWorkingYears: 0.6803
  • Age-MonthlyIncome: 0.4978
  • MonthlyIncome-TotalWorkingYears: 0.7728
  • Age-Education: 0.2080

To see how these columns correlate with each other, we can create a quick Scatterplot Matrix, which will show us different graphs for each case.

As we can see, Monthly Income and Age have a strong correlation, meaning that the older the employee is, the more money they make. Monthly Income also correlates with TotalWorkingYears, which makes sense. Professionals with more experience tend to earn more than the ones without it. Age does also correlate to TotalWorkingYears, meaning that the older the employee is, the more experience they have.

On the other hand, it’s hard to see a correlation with the Education column. That’s because our Education values go from 1 to 5, creating a ranking that makes any point in the middle impossible.


2. Employees leaving the company based on age

In order to understand how age affects employees leaving the company, we will have to go a further than correlations and scatterplots. This is where our “Attrition” column comes into place.? Attrition has only 2 values: Yes and No. It describes whether the employee left the company or not.

For this, we can use the Age and Attrition columns to create a boxplot. This will allow us to visualize the distribution of our employees.

Although the median for “Yes” is lower than the one for “No”, these two graphs look a bit too similar. It makes it hard to find an answer, pushing us to run a hypothesis test.

To do this, we’ll need the average age of the employees that stayed and the average age of the employees that left. Once we compare these two, we’ll calculate a p-value that will give us a closer answer whether Age plays a role on who stayed at the company and who left.

With these two values calculated, we can use a t.test function that will calculate and show our p-value. This is also called a Welch Two Sample t-test.

We can see that our p-value is 1.38e-08. But what does this mean? In statistics, if the p-value is less than 0.05, it means that there’s a significant difference between the samples. In this case, our p-values translates to:

1.38 = coefficient | e = 10 to the power of | -8 = exponent

= 1.38e-8 = 1.38 × 10-8 = 0.0000000138

This is significantly lower than 0.05, meaning that there’s an important difference between the two cases. We can say with confidence that those who left the company were younger than the ones who stayed.


We could also try the same test, but this time using the EmployeeNumber. This will show us what the results look like when the test fails. Let’s create a boxplot with the following code:

Since the two graphs look similar as well, we’ll run the Welch Two Sample t-test with the average by employee number.

As we can see, our p-value is higher than 0.05, which means that the employee number has nothing to do with whether an employee leaves or stay at the company.


3. Employees earning more money based on age

Even with a simple scatterplot, it’s hard to tell if there’s a trend in the correlation between Monthly Income and Age. This is where we need to be able to visualize and predict how much money someone makes based on their age, using a linear regression.

We can create a predictive model with our data, using the following code:

In the results, we can read the Estimate as 256.57, which means that for each year older someone gets, they should earn $257 more per month.

To get an employee’s monthly income based on their age, we multiply the Estimate (256,57) by their Age(let’s say 29) + the Intercept (-2970). In this case, that will give us $4,470.

In order to see how accurate this model is, we can also check the R-squared value in the results. In this case, our R-squared value is 0.2479, meaning that we can explain 25% of our employees income by how old they are.

Our p-value in this case is 2.2e-16 (0.00000000000000022), which means that our model can be trusted.

Now, let’s go further and add an extra value to create a Multiple Linear Regression. In this case, we’ll add the TotalWorkingYears to our previous equation.

We can see very similar results as our previous equation. But this time, our R-squared value is 0.5988, meaning that we can explain 60% of our employees monthly income based on their age and total working years. This makes our model more accurate than the previous one.

With this values, we can calculate our monthly income with the following formula:


Estimate Age(-26.87)xAge(let’s say 29)

+

Estimate TotalWorkingYears(489.13) x TotalWorkingYears(let’s say 9)

+

Intercept(1978)

= $5600


Findings

1.??? The columns that correlate the most are:

  • MonthlyIncome-TotalWorkingYears: 0.7728
  • Age-TotalWorkingYears: 0.6803
  • Age-MonthlyIncome: 0.4978

2.??? Employees that left the company were younger than the ones who stayed.

3.??? Age played a significant role in how much money employees make per month. For every year older someone gets, they make $257 more.

Natalie Leal Blanco

ECE @ UT Austin | Software Engineer | AI /ML & Computer Vision | Data Science | Open to Work

6 个月

Very insightful.

要查看或添加评论,请登录

Diego Manssur的更多文章

  • Take The Shot!

    Take The Shot!

    When we talk about sports, we usually think of classic activities like soccer, football, hockey or baseball. But in the…

    6 条评论
  • Let's Get The Iron

    Let's Get The Iron

    Mining and manufacturing can be a very complicated process. Besides finding a proper location to dig, professionals…

  • Dribble Pass & Shoot!

    Dribble Pass & Shoot!

    Have you ever been nervous before your favorite team plays a game? Have you wondered what the chances of winning are? I…

    5 条评论
  • Health is Wealth!

    Health is Wealth!

    Do you remember your last stay at the hospital? Was it pleasant? Did you wait a long time to get attention? We all have…

  • Welcome To Canada

    Welcome To Canada

    In the past 8 years, immigration laws and the education system have drastically changed in order to welcome a higher…

    2 条评论
  • Where is The Money?

    Where is The Money?

    Ever since I was a kid, I’ve always been conscious about money and its role in life. Growing up in Ecuador, there was a…

    3 条评论
  • Analyzing DoorDash Sales Throughout The Year

    Analyzing DoorDash Sales Throughout The Year

    Ever since we experienced a lockdown for the first time, food delivery services have appeared and increased rapidly…

    4 条评论
  • Soulmates

    Soulmates

    Hi!! I'm happy to show you my film called "Soulmates". I hope you like it!

社区洞察

其他会员也浏览了