HR Analytics in PostgreSQL

HR Analytics in PostgreSQL

This case study is contained within the Serious SQL course by Danny Ma .

Introduction

People analytics is the collection and application of talent data to improve critical talent and business outcomes according to Gartner. This project will be an attempt to understand organizational structure, trends, attrition, of the HR Analytica workforce.

Problem

I have been tasked by HR Anaytica to create reusable data sets to act as source data for two of their client HR analytic tools. I've been asked to create database views that HR Analytica can use for 2 key dashboards, reporting solutions, and and ad-hoc analytic requests.

Approach:

  1. Understanding the Business Case: The owner of HR Analytica has asked that datasets be created from the database in order to provide three forms of analysis: Current Analysis, Employee Churn, and Management analysis.
  2. Building A Measurement Plan: We will require 3 levels of insight based on around the following metrics.

  • Total Number of employees
  • Average company tenure in years
  • Average latest payrise percentage
  • Statistical Metrics for salary values that include the minimum, maximum, standard deviation, IQR, and the median for salary values.
  • The insights will be for 3 levels of the organization: company level, department level, and title level insights.

3. Data Exploration

  • Look at the base data to see if we can create tools from it. We have the help of an ERD to help us map out the connections between tables, but we can do a quick exploration to understand the nature of each table.

4. Data Cleaning/ Materialized View Creation

  • There was a data entry error from the intern for HR Analytica and it's our job to correct these mistakes in the date columns, and create a materialized view from which teams can draw that cleansed data from.

5. Analysis Plan: The following information must be provided for the employee level deep dive tool.

  • Employment history, ordered by effective data along with salary, department, manager, and title changes.
  • Calculate previous historic payrise % and changes.
  • Calculate previous position and department history in months with start and end dates.
  • Compare current employee salary, total company tenure, dept, position, and gender to average benchmarks for current position.

6. Use Cases:

In this part of the article, I'll examine some use cases for the deep dive employee views and some of the other data assets we created.

7. Recommendations

I'll provide insights and strategic recommendations for HR Analytica based on my findings from the use cases section.

3. Data Exploration:

  • Entity Relationship Diagram: The project pulls data from the employees schema which contains information at a varying level of granularity. We have title level views, employee information, salary information, department information, and department manager information stored within the database.

No alt text provided for this image

There are 6 datasets within the schema, so I'll have a look at each of them individually.

Employees.Employee Table

No alt text provided for this image
No alt text provided for this image

It looks like the employee.id column will be the key one in linking it to other tables to grab information which is what one would expect.

Employees.Department Table

No alt text provided for this image
No alt text provided for this image

The department.id column will be the link to the department_employee table and in turn the link to the rest of the dataset. This information is mainly a lookup of the department names.

Employee.Department_Employee Table

No alt text provided for this image
No alt text provided for this image

This table contains date information as well as both employee id and department id information. It'll serve as a link to the department table, the employee table, and the department manager table.

Employee.Department_Manager Table

No alt text provided for this image
No alt text provided for this image

This table contains historical information for each department's manager. It captures the changes in management over time.

No alt text provided for this image
No alt text provided for this image

Looks like department d005 has had two department managers over time. Next, let's have a look at some of the more descriptive tables for our analysis which will be the salary table and titles table.

Employees.Salary Table

No alt text provided for this image
No alt text provided for this image

Exploring this a bit and we can see that there is a many to one relationship in regards to employees and the number of salaries they have. Meaning that employees can have more than one salary over time. Finally, we'll have a look at the employees title table.

Employees.Title Table

No alt text provided for this image
No alt text provided for this image

We can see that there is a number of different titles. I need to check the nature of the relationship between employee id's and titles.

No alt text provided for this image
No alt text provided for this image

Similarly, each employee can have multiple titles. Indicating that there is many to one relationship between titles and employees. Lastly, I'll have a more in-depth look at the distribution of titles for employees.

No alt text provided for this image
No alt text provided for this image

From our counts above we can sere that near half of the employees in the database have held a single title throughout their time at the company.

4. Data Cleaning

Hr Analytica has notified us that there was an issue with data entry from one of the interns, where the start column from_date was actually 18 years after the actual start of the employee so that will need to be fixed. In order to aid the HR Analytica team, I'll be making these data cleaning adjustments in a materialized view with the same structure as the original indexes so that they can serve as the source for future analysis.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

We know have a materialized view with the same schema format as the original table. We are ready to start constructing our master tables to perform our analysis.

5. Analytical Views

Now that we have these views from which we can grab the data, and the date information has been fixed, it's time to start planning the analysis. Remember, that we want a current employee snapshot view, dashboard aggregation views at the department level and title level, and finally the historical employee analysis view as well. Let's start with the first problem which is getting our current employee snapshot.

Current Employee Analysis

Analysis plan:

  1. Use lag window function on salary materialized view so that we can have a running previous salary with records up to '9999-01-01' our proxy for someone still at the company.
  2. Join previous salary and other information for dashboard analysis.
  3. Filter using a WHERE clause for current records.
  4. Need gender column for employee view.
  5. Use hire_date column to calculate tenure.
  6. Include from_date columns from title and department tables to calculate tenure.
  7. Use salary to calculate current average salary.
  8. Include department and title information for aggregation options.
  9. Implement statistical measures from before as they relate to salary.
  10. Combine all elements into single snapshot view.

No alt text provided for this image
No alt text provided for this image
No alt text provided for this image

After combining all of that information into a snapshot view table, I wanted to query it to see what the output looks like.

No alt text provided for this image
No alt text provided for this image

It looks our query worked. We have employee id information, gender, title, salary, and department information. We also have the percentage change in salary, employee company tenure, and department level tenure. Now that we have the data in a format that we can perform aggregations with, let's create some views for those aggregations.

No alt text provided for this image
No alt text provided for this image

Looking at the view above we can see information for the company as a whole. We have the employee counts and percentages by gender, the average company tenure, the average salary by gender, percentage change in salary, and some other other statistics like the max salary, the minimum salary, the interquartile range as it respects to gender, etc. One thing to note is the similarity in salary level for both genders, with a little than a thousand dollars separating average salary values. Additionally, the average salary_percentage_change indicates that there are strict procedures in place for salary bumps and that are standardized across the company. Next, I'll look into the department level views to see if I can get a little more granularity.

No alt text provided for this image

As you can see, we now have the gender breakdown split up by the various departments. I'll do the title level view next, before answer some questions these views to complete our analysis.

No alt text provided for this image
No alt text provided for this image

Next, I'll create some views that will be used as benchmarks for our analysis later on.

No alt text provided for this image

Finally, we'll construct a view for an employee deep dive. However, the script for that view is a little too large to be captured in a carbon snippet, so I'll include the link to the full script here. Within this article, I'll go on to show some of the views that we can take with our final scripting solution. We'll have a look at these views:

  • Department Level Results
  • Title Level Results
  • Salary Benchmark results by: company tenure, gender, title
  • Employee Deep Dive

Department Level Results

No alt text provided for this image

This view provides with a gender breakdown for each department that explores employee counts, employee percentage, salaries by gender, salary changes, basically all of the same calculations from the company level view.

Title Level Results

No alt text provided for this image
No alt text provided for this image

The output is just another way to view the data, but from the title level. There's still a breakdown by gender across all of the metrics that we've captured earlier.

Salary Benchmark Views

No alt text provided for this image
No alt text provided for this image

We can see that the salary trended up as employee gained more tenure within the company.

No alt text provided for this image
No alt text provided for this image

Salary benchmarks by department. It looks like the sales, marketing, and finance departments have the highest average salaries within them. Human Resources, Quality Management, and Customer Service have the lowest salaries associated with them.

Employee Deep Dive

No alt text provided for this image
No alt text provided for this image

6. Example Use-Cases

Current Employee Analysis

  1. HR Analytica wants to know the name of the employee with the highest salary?
  2. Are there multiple employees with the longest amount of time spent in their position?
  3. Which department has the most number of employees?
  4. How did women's salaries compare to the average salary for the largest department?

Employee Attrition

  1. How many employees have left HR Analytica?
  2. What percentage of employees were male? Female?
  3. Which department had the highest attrition level?
  4. Which title had the highest level of attrition?
  5. Which year did the most people leave the company?

Management Analysis

  1. How many managers are at the company currently?
  2. How many employees have ever been a manager?
  3. How long did it take for an employee to become a manager from their date of hire?


For the sake of length, I'll just answer questions related to current employees.

Current Employee Analysis Queries

No alt text provided for this image
No alt text provided for this image

Tokuyasu Pesch had the highest salary at $158,222 per year. WTF.

No alt text provided for this image
No alt text provided for this image

There are 3,505 employees tied with 20 years in their current role.

No alt text provided for this image
No alt text provided for this image

Looking at the department count we can see the Development department has the most employees. I want to see how many women in that department make more than the average salary for that department.

No alt text provided for this image
No alt text provided for this image

It looks only 11,367 women made more than the average salary count for a department of 61,386 people. I'm gong to see which department has the highest average salary for female employees as a next step down the road of granularity.

No alt text provided for this image
No alt text provided for this image

7. Recommendations/ Conclusion

Well based on the analysis of current employees these are some of the recommendations I have related to departments to work in, longevity at HR Analytica, and salary equality.

  • I'd recommend anyone that joins the company to look into the sales and marketing departments. These two departments had the highest salary regardless of gender.
  • The highest current salary at the company is $158,220.
  • The largest department, Development, has over 61,000 employees and of those employees 11,367 make less than average salary for that department. I'd recommend that HR Analytic dig deeper into this matter by exploring this statistic across the rest of its departments.
  • People tend to stay a long time as the highest average amount of tenure 20 years had over 3,000 employees and this was case down to 16 years.
  • I haven't quite finished my analysis as I want to explore attrition and conduct a management analysis in pt. 2 of this project. Thank for taking the time to read my article and I'd love any feedback from season HR Analysts or SQL pros.

You can see all of the code for the preparation part of the project here.

Simon Oganga, PMP?

Project Manager | Business Analyst | Data Analyst | Agile | AWS Cloud Practitioner

1 年

J

回复
Matt Makofske

Pharmacy Analytics Consultant @ Optum | Leading insights on drug utilization, cost, and pricing strategy

1 年

Thanks for making your methods so available and easy to follow. I remember when I was learning SQL, and though there are good sources for equations and troubleshooting, there weren't (and still aren't) that many like yours that talk through how to do it, why you did it that way, and make it relatable. Nice work and I hope many analysts of all levels learn from you!

回复
Shifra Isaacs

Data Scientist | Technical Writer | Support Engineer

1 年

Great article! I personally like to take the lowercase for string fields so I might have filtered on LOWER(gender) = 'f'

Sa Bui

Data Analytics | Program Evaluation | Conscious Parenting and Life Coach

1 年

Awesome

Sarah Rajani

'Data with Sarah' ? Data Analyst at Government of AB (Ministry of Justice) ? Sharing practical data tips, insights, and lessons learned

1 年

Great writeup, very detailed!

要查看或添加评论,请登录

Andrew Lujan的更多文章

社区洞察

其他会员也浏览了