How you measure attrition matters (HR: A demographic approach, Part I)
Ben Hanowell
Director of People Analytics Research, ADP Research. I study the decisions of employees and employers. My posts reflect my own thoughts.
For any business, labor is one of the greatest costs. Yet labor is also key to production. When employees leave, whether regrettably or un-regrettably, they take their job experience with them. Companies need to expend cost to recruit replacements, and those replacements are less productive while they ramp up, especially for high-skilled jobs that require a combination of technical skills, soft skills, and institutional knowledge.
For these reasons, employers need to understand the rate at which employees leave the company (otherwise known as attrition), how attrition changes over time, how attrition relates to both internal and external factors, and how attrition impacts the value of the workforce. Many companies measure attrition incorrectly. If you aren't measuring attrition correctly, you can't even begin to understand its dynamics, what impacts it, how it impacts your company's performance, or how your company can influence it.
Part I: How you measure attrition matters
In this post, I'll teach you how to measure attrition by first teaching you what attrition isn't. Then I'll teach you what attrition is. Then I'll show you how four common practices of measuring attrition lead companies to over- or under-estimate headcount costs. We'll see that what might seem like quibbling about attrition arithmetic is in fact worth millions of dollars for even a mid-sized company.
What the attrition rate isn't
If you're like most human resources departments, you probably define the attrition rate as the number of employees who leave out of the average employee headcount over a specific period of time. That is, you divide the number of employees who leave during a period of time by the average headcount during that period.
Getting at the numerator of an attrition rate (the number of employees who leave) is easy: just count the employees who leave during the period. Getting at the denominator (the average headcount) is where things get dicey because the headcount is constantly changing within the period. To deal with this problem, you probably measure average employee headcount in one of the following ways:
Later, we'll talk about which of these methods lead to the most biased estimates of average headcount, attrition, and related figures. Right now there's a more basic issue to address: All three of the methods above are only rough approximations of the denominator of a proper attrition rate. So the way you measure attrition is also a rough approximation.
In other words, and stay with me here, the attrition rate is at its core not the number of employees who leave out of the average headcount. So what is it then?
What the attrition rate is
I'll illustrate what attrition is with a simple example. Suppose that our company has five employees who worked at some point during the 2018 calendar year:
A quick count shows that 3 employees left the company during calendar year 2018, which we can put into the numerator of our attrition rate.
What about the denominator? If we use the mid-period headcount, we count 5 employees on 2 July 2018. If we use the period-start headcount, we count 3 employees on 1 January 2018. If we use the period-end headcount, we count 2 employees on 31 December 2018. If we average the period-start and period-end headcount, we get 2.5 employees. So depending on which headcount snapshot we use, our attrition rate could be anywhere from:
Which of these attrition rates is right? None of them. By taking the headcount snapshot on a particular date (or averaging two daily headcount snapshots) we're throwing out all of the information we have about the other days that employees worked in the year.
A proper rate is actually the number of events (in this case terminations) out of the total amount of time that the population of interest (in this case our employees) was at risk of experiencing that event (in this case actively employed). We call the amount of time that employees spend at risk of attrition the number of employee-periods. In the case of a calendar year, we call it employee-years. For quarter-years, we call it employee-quarters. For month-years, employee-months.
(attrition rate) = (# of employee terminations in the period) ÷ (# of employee-periods at risk)
To calculate employee-years for each of our employees, we count the number of days they spent working in 2018 and divide it by the number of days in 2018.
Add up all the employee-years and you get about 3.6 attrition years (if you added up the numbers above and didn't get about 3.6, it's just rounding error). So our 2018 attrition rate is:
(attrition rate) = (3 terminations) / (~3.6 employee-years) = (~84 percent)
This attrition rate is more correct because each employee is weighted in the denominator by the exact fraction of the year that they were at risk of attrition, to the nearest day. Even this attrition rate is approximate. If we knew the amount of time each employee worked to the nearest millisecond, we could calculate employee-years even more precisely.
Period versus cohort attrition rates
Up to now we've talked about period attrition rates, which are the rate of attrition during a given period of time, such as the 2018 calendar year. We've also calculated period attrition for all employees, but it is possible to calculate period attrition for a specific subset of employees by counting terminations and employee-years only for that employee subset. For example, you might calculate an attrition rate for employees by job family.
You can also calculate cohort attrition rates for a specific cohort of employees who were all hired or employed during the same period. Cohort attrition rates are a special case of a period attrition rate where you are counting the terminations and employee-years in a given period of interest (say calendar year 2018) for the subset of employees who were hired during the period of time that defines the cohort. For example, a cohort could be defined as all employees who were hired during the calendar year 2005. Another example cohort might be all employees who were employed on 31 December 2005.
As we'll see in this and future posts in the series, many human resources departments conflate period and cohort attrition rates, which becomes a problem when they decide how to approximate a proper attrition rate of any kind.
For example, business leaders often think they need to include headcount from the end of previous period in their estimate of average headcount. Whenever I ask why, it often comes down to their mistaken notion that an attrition rate is the share of employees who leave the company rather than, as I have shown, the number of terminations out of the total time that employees were at risk of attrition in the period.
When analysts don't take the time to explain (or understand) the difference between period attrition rates, cohort attrition rates, and chance of termination, it leads to compromises in attrition rate methodology like the following story, which I've encountered twice in my career in industry:
Some analysts try to strike a compromise between estimating the period average headcount and satisfying their business leaders' (misguided) desire for including headcount from the end of the previous period. The way the've done this in both cases I witnessed is by estimating the annual average headcount from the trailing 13-month average of end of period headcount in December of the current year. We'll see that this method approximates actual average headcount well, but at the cost of losing data from the earliest periods in our company's history.
Why estimate attrition rates from headcount snapshots?
We've reviewed how to calculate employee-periods by summing up the individual number of employee-days worked during the period and then dividing the result by the length of the period. It turns out that this is equivalent to the average headcount for the period. If a single day is the smallest unit of time for which active employment can be measured, then the average daily headcount is equal to the average headcount for the period, and therefore equal to the number of employee-periods.
If your headcount is constant throughout the period, then the headcount on any given day is equal to the average daily headcount, thus the average headcount for the period. But usually, your headcount is constantly changing from day to day, so if you had to choose a single day in the period to approximate average headcount, what would it be?
Suppose your headcount is growing linearly over time. Each passing day, the headcount increases by the same amount. The picture below shows that you should choose the middle of the period because the amount by which you over-estimate period average headcount during the first half of the period is offset by the amount by which you under-estimate it in the second half of the period.
In this example, if you use period-end instead of mid-period headcount, you will over-estimate headcount (and under-estimate attrition). If you use period-start headcount, you will under-estimate headcount (and over-estimate attrition). The converse applies if your headcount is shrinking linearly.
Suppose instead that your headcount is growing exponentially. The picture below shows that the mid-period headcount under-estimates the period average headcount because the amount by which you over-estimate in the first half of the period is less than the amount you under-estimate in the second half of the period.
In the case of exponential headcount growth above, you'd need to find an alternative headcount date somewhere between the mid-period and period end for a better approximation. You can find that date if you know the headcount growth rate and pattern. In reality, you won't know that. Thankfully, if your period length is short enough relative to your headcount growth rate, your growth will be approximately linear, and the mid-period will again be a good enough approximation if you must pick a single headcount snapshot.
What if your HR data system only gives you the headcount at the start and end of the period, but not the middle? In this case, you can approximate the mid-period headcount by taking the average of the period-start and period-end headcount.
What if your HR data system only gives you the headcount at the same time each month? In this case, you can approximate the average period headcount better by averaging all the monthly headcount snapshots within the period. Following our logic about mid-period approximation of average headcount, the best case scenario is when you have either the mid-month headcount or can average the month-start and month-end headcount.
But if your HR data system gives you the start and end dates for every single employee, why not just calculate the exact employee-periods and be done with it?
In this day and age, computational expense is no excuse. I have used standard database applications (SQL Server and Amazon Redshift) to calculate the average daily headcount for datasets that contained hundreds of thousands to hundreds of millions of employee-days over three years to a decade, and the database administrator yawned or just didn't even notice... even after I screwed up the query and re-ran a different version of it like 12 times. Later on in this series, we'll examine how even Excel can be used to calculate exact employee-days and employee-periods for sizable datasets.
Review so far:
What if I want to know the percent chance that employees leave?
Just because attrition rates aren't measuring the proportion of employees who leave during a year doesn't mean that isn't important. Suppose you want to estimate the proportion of your end-of-year headcount that will leave during the next year. This means you need an estimate of the probability that an employee will leave within a year. So long as you are comfortable with assuming that the attrition rate is constant over the period and you have an annual attrition rate estimate that applies to the next year, you can use the following formula to transform any annual attrition rate into a percent chance of departure within a year:
领英推荐
(probability of employee attrition within a year) = 100 × (1 - exp[-a])
Above, "exp[]" is the exponential function and a is the annual attrition rate. If you wanted to calculate the chance of leaving within some fraction of a year f, the formula becomes:
(probability of employee attrition within f years) = 100 × (1 - exp[-af])
There are much more sophisticated methods of translating rates into probabilities in cases where the attrition rate is not constant over time, and we'll cover those in future lessons. For a quick estimate of the share of year-end headcount that will leave during the next year, what I described above is good enough.
Is this just pedantic quibbling over arithmetic? (No, it's not.)
I don't blame you if you aren't yet convinced that you should measure period average headcount, thus attrition rates, using exact employee periods instead of headcount snapshots. Yet in the following section and in future installments in this series, I will show you that reducing attrition measurement error is worth a lot of money, in principle because headcount is so costly.
A simulation model shows how attrition methodology matters
In this section, I'll show you that:
To do this, I'll build models of 2006 to 2018 headcount for a fake company where we know the exact daily headcount under three different headcount growth patterns: linear, exponential, and logistic. I'll examine the error in annual average headcount approximations (and thus annual attrition rate approximation) using headcount snapshots from the start, middle, and end of the period, as well as the average of the start and end of the period, then comparing the snapshot method to the exact average headcount method.
Using the three headcount growth models, I'll estimate the signed error and signed percent error in average headcount projection, headcount cost projection, headcount cost as share of total budget, and the regrettable attrition rate. The model assumes a $100K per year salary, and that actual annual headcount costs are 70 percent of the total annual budget. We further assume that the annual attrition rate is a constant 20 out of 100 employee-years, which we assess by sampling the number of terminations each day by from a Poisson distribution with a rate parameter equal to 20 percent divided by the average number of days in a year. Finally, we assume that half of attrition is regrettable, in the sense that you wish that employee wouldn't have left.
Annual headcount and regrettable attrition rate approximations
Below is the daily headcount for each of the three growth pattern models. The linear growth pattern is... well... linear. The exponential growth pattern shows the increase in day-to-day headcount rising over time. The logistic growth pattern shows the increase in headcount accelerating at first before slowing down as headcount reaches an assumed headcount capacity of 1,000 employees. The three scenarios have compound annual headcount growth rates of around 18 to 20 percent, which is realistic for a successful mid-level start-up.
Next, look at the annual average headcount approximations compared to exact employee-years (i.e., exact annual average headcount). Notice how the period start headcount consistently under-estimates monthly average headcount and the period-end headcount over-estimates average headcount, whereas the mid-period headcount (and its approximation from the average of starting and ending headcount) is quite close to the exact average headcount under all three growth pattern scenarios.
Next, check out the signed errors of the annual average headcount approximations, defined as the difference between the approximate and the exact average headcount.
Notice how the errors are constant under linear growth because the same number of employees is added every day. In contrast, the errors expand under exponential growth because, with each passing day (thus each passing year), the within-period headcount growth curve becomes less and less like a straight line, and a greater and greater number of headcount is added. Mid-period headcount continues to perform better for the exponential growth case than the period start and period end methods. The pattern of average headcount approximation error under logistic growth shows how the errors will be greatest during periods when headcount is changing most rapidly.
Next, translate the headcount errors into headcount cost errors by multiplying the average headcount approximation by the annual salary of $100K. The results are striking.
For example, in the exponential growth pattern case, even if you could perfectly predict year-end headcount in 2015, you'd over-estimate total headcount cost by five million dollars. By 2018, you're over-estimating total headcount cost by almost twice that. Under logistic growth, your period-end headcount method over-estimates headcount cost by the widest margin while your headcount is changing the most rapidly. If you had used the mid-period approximation or, better yet, exact average headcount, your headcount cost projection errors would be negligible.
Now look at the headcount approximation errors as a percentage of total budget, where the actual headcount cost is assumed to be 70 percent of the total budget. Do you like over-projecting headcount budget share by over five percent? Then if your headcount is growing exponentially, you should totally use the year-end headcount approximation! I kid.
Next, look at the percent error in our estimate of regrettable attrition. Considering that regrettable attrition is a key indicator of your company's ability to satisfy and retain talent, I'm guessing you aren't interested in over- or under-estimating how well or how poorly you're doing by 10 percent under exponential headcount growth. I'm also guessing you don't want to be even more wrong about regrettable attrition during periods of rapid headcount change (see the logistic growth case), or during the early periods in your company's history when annual headcount increase is small relative to attrition risk (see the early years in both the linear and logistic growth cases).
Approximating annual average headcount with average monthly headcount
So far we've looked at headcount approximation errors when using headcount snapshots at different points in the year. Yet you can better approximate annual average headcount by averaging monthly headcount within the year. As the picture below shows, the monthly average based on month-start and month-end averages are much closer to average headcount than their annual snapshot equivalents. Indeed, at this scale, all four approximations are indistinguishable from one another.
Looking at the signed error of the approximations, however, we see that the average of mid-month headcount performs the best, as expected.
About that trailing 13-month average of month-end headcount...
Recall that analysts sometimes compromise between business leaders' misunderstanding of attrition rates and the need to estimate average headcount by taking the trailing 13-month average of month-end headcount. Below, we compare the trailing 13-month average method to the mid-year headcount, showing that the trailing 13-month method is a better approximation. This is no surprise since it is averaging monthly headcount rather than relying on a single estimate from the middle of the year. The trailing 13-month estimate tends to underestimate headcount because it uses month-end headcount from the final month in the previous period.
When we compare the trailing 13-month method to trailing 12-month month-end headcount, we see that the trailing 13-month method is by far the better choice. Apparently, including the month-end headcount from the previous year counteracts the bias of using month-end headcount averages.
Lastly, we compare the trailing 13-month average of month-end headcount to the annual average mid-month headcount. Amazingly, the trailing 13-month average performs about the same as the annual mid-month average.
The difference between the two methods is in fact quite small. The methods are so similar for a curious reason, which I explain in steps:
Yet you may have noticed something is missing from each of the charts above. Specifically, the trailing 13-month method has no estimate for the first year in the dataset because there aren't enough months to computing a 13-month trailing average until January 2007 (the second year in the dataset).
This is a problem for at least two reasons:
These two issues limit our ability not only to estimate attrition rates for the companies and employee segments of interest. They also limit our ability to build models that combine data from multiple companies (even nows) and employee segments within companies (again even new ones) to predict future attrition risk. We could impute (that is, fill in) the missing years with some other estimate (such as the annual average mid-month average), but then our measurement method and its biases are inconsistent, thus harder to analyze.
Maybe we should just stick with annual averages of mid-month headcount or, better yet, the gold standard of annual average daily headcount.
Take home messages
All of the code used to produce the charts in this article are in this GitHub Gist.
Data Analytics & Visualization Specialist | Tableau Developer | Engineer of CEO-Ready Business Intelligence @NBCU, $4M/Yr Profit Science @AmEx and So What? KPIs for UBI
3 年I confess I have not digested your essay thoroughly - and that's only because you have extensively documented the deeper thinking this topic deserves. I came to it believing that employee-periods is the right denominator for a rate calculation, as you state - and I will surely return for your "why it matters" when I need to defend that with others ??