HR Analytics in PostgreSQL
Andrew Lujan
Data Scientist & Business Intelligence Analyst | I help teams make smarter decisions through data storytelling
This case study is contained within the Serious SQL course by Danny Ma .
Introduction
People analytics is the collection and application of talent data to improve critical talent and business outcomes according to Gartner. This project will be an attempt to understand organizational structure, trends, attrition, of the HR Analytica workforce.
Problem
I have been tasked by HR Anaytica to create reusable data sets to act as source data for two of their client HR analytic tools. I've been asked to create database views that HR Analytica can use for 2 key dashboards, reporting solutions, and and ad-hoc analytic requests.
Approach:
3. Data Exploration
4. Data Cleaning/ Materialized View Creation
5. Analysis Plan: The following information must be provided for the employee level deep dive tool.
6. Use Cases:
In this part of the article, I'll examine some use cases for the deep dive employee views and some of the other data assets we created.
7. Recommendations
I'll provide insights and strategic recommendations for HR Analytica based on my findings from the use cases section.
3. Data Exploration:
There are 6 datasets within the schema, so I'll have a look at each of them individually.
Employees.Employee Table
It looks like the employee.id column will be the key one in linking it to other tables to grab information which is what one would expect.
Employees.Department Table
The department.id column will be the link to the department_employee table and in turn the link to the rest of the dataset. This information is mainly a lookup of the department names.
Employee.Department_Employee Table
This table contains date information as well as both employee id and department id information. It'll serve as a link to the department table, the employee table, and the department manager table.
Employee.Department_Manager Table
This table contains historical information for each department's manager. It captures the changes in management over time.
Looks like department d005 has had two department managers over time. Next, let's have a look at some of the more descriptive tables for our analysis which will be the salary table and titles table.
Employees.Salary Table
Exploring this a bit and we can see that there is a many to one relationship in regards to employees and the number of salaries they have. Meaning that employees can have more than one salary over time. Finally, we'll have a look at the employees title table.
Employees.Title Table
We can see that there is a number of different titles. I need to check the nature of the relationship between employee id's and titles.
Similarly, each employee can have multiple titles. Indicating that there is many to one relationship between titles and employees. Lastly, I'll have a more in-depth look at the distribution of titles for employees.
From our counts above we can sere that near half of the employees in the database have held a single title throughout their time at the company.
4. Data Cleaning
Hr Analytica has notified us that there was an issue with data entry from one of the interns, where the start column from_date was actually 18 years after the actual start of the employee so that will need to be fixed. In order to aid the HR Analytica team, I'll be making these data cleaning adjustments in a materialized view with the same structure as the original indexes so that they can serve as the source for future analysis.
We know have a materialized view with the same schema format as the original table. We are ready to start constructing our master tables to perform our analysis.
5. Analytical Views
Now that we have these views from which we can grab the data, and the date information has been fixed, it's time to start planning the analysis. Remember, that we want a current employee snapshot view, dashboard aggregation views at the department level and title level, and finally the historical employee analysis view as well. Let's start with the first problem which is getting our current employee snapshot.
领英推荐
Current Employee Analysis
Analysis plan:
After combining all of that information into a snapshot view table, I wanted to query it to see what the output looks like.
It looks our query worked. We have employee id information, gender, title, salary, and department information. We also have the percentage change in salary, employee company tenure, and department level tenure. Now that we have the data in a format that we can perform aggregations with, let's create some views for those aggregations.
Looking at the view above we can see information for the company as a whole. We have the employee counts and percentages by gender, the average company tenure, the average salary by gender, percentage change in salary, and some other other statistics like the max salary, the minimum salary, the interquartile range as it respects to gender, etc. One thing to note is the similarity in salary level for both genders, with a little than a thousand dollars separating average salary values. Additionally, the average salary_percentage_change indicates that there are strict procedures in place for salary bumps and that are standardized across the company. Next, I'll look into the department level views to see if I can get a little more granularity.
As you can see, we now have the gender breakdown split up by the various departments. I'll do the title level view next, before answer some questions these views to complete our analysis.
Next, I'll create some views that will be used as benchmarks for our analysis later on.
Finally, we'll construct a view for an employee deep dive. However, the script for that view is a little too large to be captured in a carbon snippet, so I'll include the link to the full script here. Within this article, I'll go on to show some of the views that we can take with our final scripting solution. We'll have a look at these views:
Department Level Results
This view provides with a gender breakdown for each department that explores employee counts, employee percentage, salaries by gender, salary changes, basically all of the same calculations from the company level view.
Title Level Results
The output is just another way to view the data, but from the title level. There's still a breakdown by gender across all of the metrics that we've captured earlier.
Salary Benchmark Views
We can see that the salary trended up as employee gained more tenure within the company.
Salary benchmarks by department. It looks like the sales, marketing, and finance departments have the highest average salaries within them. Human Resources, Quality Management, and Customer Service have the lowest salaries associated with them.
Employee Deep Dive
6. Example Use-Cases
Current Employee Analysis
Employee Attrition
Management Analysis
For the sake of length, I'll just answer questions related to current employees.
Current Employee Analysis Queries
Tokuyasu Pesch had the highest salary at $158,222 per year. WTF.
There are 3,505 employees tied with 20 years in their current role.
Looking at the department count we can see the Development department has the most employees. I want to see how many women in that department make more than the average salary for that department.
It looks only 11,367 women made more than the average salary count for a department of 61,386 people. I'm gong to see which department has the highest average salary for female employees as a next step down the road of granularity.
7. Recommendations/ Conclusion
Well based on the analysis of current employees these are some of the recommendations I have related to departments to work in, longevity at HR Analytica, and salary equality.
You can see all of the code for the preparation part of the project here.
Project Manager | Business Analyst | Data Analyst | Agile | AWS Cloud Practitioner
1 年J
Pharmacy Analytics Consultant @ Optum | Leading insights on drug utilization, cost, and pricing strategy
1 年Thanks for making your methods so available and easy to follow. I remember when I was learning SQL, and though there are good sources for equations and troubleshooting, there weren't (and still aren't) that many like yours that talk through how to do it, why you did it that way, and make it relatable. Nice work and I hope many analysts of all levels learn from you!
Data Scientist | Technical Writer | Support Engineer
1 年Great article! I personally like to take the lowercase for string fields so I might have filtered on LOWER(gender) = 'f'
Data Analytics | Program Evaluation | Conscious Parenting and Life Coach
1 年Awesome
'Data with Sarah' ? Data Analyst at Government of AB (Ministry of Justice) ? Sharing practical data tips, insights, and lessons learned
1 年Great writeup, very detailed!