How you measure attrition matters (HR: A demographic approach, Part I)

For any business, labor is one of the greatest costs. Yet labor is also key to production. When employees leave, whether regrettably or un-regrettably, they take their job experience with them. Companies need to expend cost to recruit replacements, and those replacements are less productive while they ramp up, especially for high-skilled jobs that require a combination of technical skills, soft skills, and institutional knowledge.

For these reasons, employers need to understand the rate at which employees leave the company (otherwise known as attrition), how attrition changes over time, how attrition relates to both internal and external factors, and how attrition impacts the value of the workforce. Many companies measure attrition incorrectly. If you aren't measuring attrition correctly, you can't even begin to understand its dynamics, what impacts it, how it impacts your company's performance, or how your company can influence it.

Part I: How you measure attrition matters

In this post, I'll teach you how to measure attrition by first teaching you what attrition isn't. Then I'll teach you what attrition is. Then I'll show you how four common practices of measuring attrition lead companies to over- or under-estimate headcount costs. We'll see that what might seem like quibbling about attrition arithmetic is in fact worth millions of dollars for even a mid-sized company.

What the attrition rate isn't

If you're like most human resources departments, you probably define the attrition rate as the number of employees who leave out of the average employee headcount over a specific period of time. That is, you divide the number of employees who leave during a period of time by the average headcount during that period.

Getting at the numerator of an attrition rate (the number of employees who leave) is easy: just count the employees who leave during the period. Getting at the denominator (the average headcount) is where things get dicey because the headcount is constantly changing within the period. To deal with this problem, you probably measure average employee headcount in one of the following ways:

  1. Observe headcount in the middle of the period (for example, mid-year)
  2. Observe headcount at some other point (say, at the beginning or end of the period)
  3. Take the average of the headcount at the beginning and end of the period

Later, we'll talk about which of these methods lead to the most biased estimates of average headcount, attrition, and related figures. Right now there's a more basic issue to address: All three of the methods above are only rough approximations of the denominator of a proper attrition rate. So the way you measure attrition is also a rough approximation.

In other words, and stay with me here, the attrition rate is at its core not the number of employees who leave out of the average headcount. So what is it then?

What the attrition rate is

I'll illustrate what attrition is with a simple example. Suppose that our company has five employees who worked at some point during the 2018 calendar year:

  1. Amad was hired 1 January 2018 and still worked at the company as of 19 March 2019.
  2. Janice had been working at the company since it was founded in 1997, but moved onto another company on 4 July 2018.
  3. Brian was hired back in 2017 and left the company on 8 December 2018.
  4. Vivian was hired on 17 May 2018 and still worked at the company as of 19 March 2019.
  5. Azariannah was hired on 17 May 2018 but was poached by another company on 11 November 2018.

A quick count shows that 3 employees left the company during calendar year 2018, which we can put into the numerator of our attrition rate.

What about the denominator? If we use the mid-period headcount, we count 5 employees on 2 July 2018. If we use the period-start headcount, we count 3 employees on 1 January 2018. If we use the period-end headcount, we count 2 employees on 31 December 2018. If we average the period-start and period-end headcount, we get 2.5 employees. So depending on which headcount snapshot we use, our attrition rate could be anywhere from:

  • 60 percent attrition for the mid-period headcount
  • 100 percent attrition for the period-start headcount
  • 120 percent attrition for the average of period-start and period-end headcount (yes, you can have more terminations than headcount in a period, which is proof that it is not the share of employees who leave the company)
  • 150 percent attrition for the period-end headcount

Which of these attrition rates is right? None of them. By taking the headcount snapshot on a particular date (or averaging two daily headcount snapshots) we're throwing out all of the information we have about the other days that employees worked in the year.

A proper rate is actually the number of events (in this case terminations) out of the total amount of time that the population of interest (in this case our employees) was at risk of experiencing that event (in this case actively employed). We call the amount of time that employees spend at risk of attrition the number of employee-periods. In the case of a calendar year, we call it employee-years. For quarter-years, we call it employee-quarters. For month-years, employee-months.

(attrition rate) = (# of employee terminations in the period) ÷ (# of employee-periods at risk)

To calculate employee-years for each of our employees, we count the number of days they spent working in 2018 and divide it by the number of days in 2018.

  1. Amad worked 365 days in 2018, so he contributes 1 employee-year
  2. Janice worked 185 days in 2018, or about 0.5 employee-years
  3. Brian worked 342 days in 2018, or about 0.9 employee-years
  4. Viviane worked 229 days in 2018, or about 0.6 employee-years
  5. Azariannah worked 179 days in 2018, or about 0.5 employee-years

Add up all the employee-years and you get about 3.6 attrition years (if you added up the numbers above and didn't get about 3.6, it's just rounding error). So our 2018 attrition rate is:

(attrition rate) = (3 terminations) / (~3.6 employee-years) = (~84 percent)

This attrition rate is more correct because each employee is weighted in the denominator by the exact fraction of the year that they were at risk of attrition, to the nearest day. Even this attrition rate is approximate. If we knew the amount of time each employee worked to the nearest millisecond, we could calculate employee-years even more precisely.

Period versus cohort attrition rates

Up to now we've talked about period attrition rates, which are the rate of attrition during a given period of time, such as the 2018 calendar year. We've also calculated period attrition for all employees, but it is possible to calculate period attrition for a specific subset of employees by counting terminations and employee-years only for that employee subset. For example, you might calculate an attrition rate for employees by job family.

You can also calculate cohort attrition rates for a specific cohort of employees who were all hired or employed during the same period. Cohort attrition rates are a special case of a period attrition rate where you are counting the terminations and employee-years in a given period of interest (say calendar year 2018) for the subset of employees who were hired during the period of time that defines the cohort. For example, a cohort could be defined as all employees who were hired during the calendar year 2005. Another example cohort might be all employees who were employed on 31 December 2005.

As we'll see in this and future posts in the series, many human resources departments conflate period and cohort attrition rates, which becomes a problem when they decide how to approximate a proper attrition rate of any kind.

For example, business leaders often think they need to include headcount from the end of previous period in their estimate of average headcount. Whenever I ask why, it often comes down to their mistaken notion that an attrition rate is the share of employees who leave the company rather than, as I have shown, the number of terminations out of the total time that employees were at risk of attrition in the period.

When analysts don't take the time to explain (or understand) the difference between period attrition rates, cohort attrition rates, and chance of termination, it leads to compromises in attrition rate methodology like the following story, which I've encountered twice in my career in industry:

Some analysts try to strike a compromise between estimating the period average headcount and satisfying their business leaders' (misguided) desire for including headcount from the end of the previous period. The way the've done this in both cases I witnessed is by estimating the annual average headcount from the trailing 13-month average of end of period headcount in December of the current year. We'll see that this method approximates actual average headcount well, but at the cost of losing data from the earliest periods in our company's history.

Why estimate attrition rates from headcount snapshots?

We've reviewed how to calculate employee-periods by summing up the individual number of employee-days worked during the period and then dividing the result by the length of the period. It turns out that this is equivalent to the average headcount for the period. If a single day is the smallest unit of time for which active employment can be measured, then the average daily headcount is equal to the average headcount for the period, and therefore equal to the number of employee-periods.

If your headcount is constant throughout the period, then the headcount on any given day is equal to the average daily headcount, thus the average headcount for the period. But usually, your headcount is constantly changing from day to day, so if you had to choose a single day in the period to approximate average headcount, what would it be?

Suppose your headcount is growing linearly over time. Each passing day, the headcount increases by the same amount. The picture below shows that you should choose the middle of the period because the amount by which you over-estimate period average headcount during the first half of the period is offset by the amount by which you under-estimate it in the second half of the period.

No alt text provided for this image

In this example, if you use period-end instead of mid-period headcount, you will over-estimate headcount (and under-estimate attrition). If you use period-start headcount, you will under-estimate headcount (and over-estimate attrition). The converse applies if your headcount is shrinking linearly.

Suppose instead that your headcount is growing exponentially. The picture below shows that the mid-period headcount under-estimates the period average headcount because the amount by which you over-estimate in the first half of the period is less than the amount you under-estimate in the second half of the period.

No alt text provided for this image

In the case of exponential headcount growth above, you'd need to find an alternative headcount date somewhere between the mid-period and period end for a better approximation. You can find that date if you know the headcount growth rate and pattern. In reality, you won't know that. Thankfully, if your period length is short enough relative to your headcount growth rate, your growth will be approximately linear, and the mid-period will again be a good enough approximation if you must pick a single headcount snapshot.

What if your HR data system only gives you the headcount at the start and end of the period, but not the middle? In this case, you can approximate the mid-period headcount by taking the average of the period-start and period-end headcount.

What if your HR data system only gives you the headcount at the same time each month? In this case, you can approximate the average period headcount better by averaging all the monthly headcount snapshots within the period. Following our logic about mid-period approximation of average headcount, the best case scenario is when you have either the mid-month headcount or can average the month-start and month-end headcount.

But if your HR data system gives you the start and end dates for every single employee, why not just calculate the exact employee-periods and be done with it?

In this day and age, computational expense is no excuse. I have used standard database applications (SQL Server and Amazon Redshift) to calculate the average daily headcount for datasets that contained hundreds of thousands to hundreds of millions of employee-days over three years to a decade, and the database administrator yawned or just didn't even notice... even after I screwed up the query and re-ran a different version of it like 12 times. Later on in this series, we'll examine how even Excel can be used to calculate exact employee-days and employee-periods for sizable datasets.

Review so far:

  1. The traditional methods for attrition rate calculation are relics of a time when it was harder to keep records of the exact start and end dates of individual employees, and so harder to calculate the exact average headcount.
  2. Attrition rates are not the proportion of employees who leave during the period or the chance that employees will leave in a period.
  3. Attrition rates are the number of employees who leave out of the number of employee-periods that employees worked.
  4. Headcount-snapshot-based attrition rate estimates are just approximations of that rate.
  5. If you must pick a day to count heads, the mid-period headcount or the average of period-start and period-end headcount are your best bet.
  6. If you don't have daily headcount, take the average monthly headcount instead, ideally the mid-month headcount or the average of month-start and month-end headcount.

What if I want to know the percent chance that employees leave?

Just because attrition rates aren't measuring the proportion of employees who leave during a year doesn't mean that isn't important. Suppose you want to estimate the proportion of your end-of-year headcount that will leave during the next year. This means you need an estimate of the probability that an employee will leave within a year. So long as you are comfortable with assuming that the attrition rate is constant over the period and you have an annual attrition rate estimate that applies to the next year, you can use the following formula to transform any annual attrition rate into a percent chance of departure within a year:

(probability of employee attrition within a year) = 100 × (1 - exp[-a])

Above, "exp[]" is the exponential function and a is the annual attrition rate. If you wanted to calculate the chance of leaving within some fraction of a year f, the formula becomes:

(probability of employee attrition within f years) = 100 × (1 - exp[-af])

There are much more sophisticated methods of translating rates into probabilities in cases where the attrition rate is not constant over time, and we'll cover those in future lessons. For a quick estimate of the share of year-end headcount that will leave during the next year, what I described above is good enough.

Is this just pedantic quibbling over arithmetic? (No, it's not.)

I don't blame you if you aren't yet convinced that you should measure period average headcount, thus attrition rates, using exact employee periods instead of headcount snapshots. Yet in the following section and in future installments in this series, I will show you that reducing attrition measurement error is worth a lot of money, in principle because headcount is so costly.

A simulation model shows how attrition methodology matters

In this section, I'll show you that:

  1. Using year-start or year-end headcount instead of mid-year headcount (or the average of year-start and year-end headcount) can lead to errors in headcount cost projection in the millions to tens of millions of dollars for even mid-sized companies, even if you can predict future headcount snapshots with 100 percent accuracy.
  2. That same mistake can lead to errors in the tens of percentage points for estimates of headcount cost as share of total budget.
  3. In addition, using the wrong headcount snapshot can lead you to be off in your estimate of regrettable attrition by ten percent or more.
  4. These errors are worst during periods when your headcount is changing most rapidly, which is when your company is most vulnerable to making poor headcount decisions.
  5. If you have monthly headcount snapshots, averaging them to get at annual average headcount and attrition is better than taking an annual snapshot. The error in using month-start and month-end counts compared to mid-month counts shrinks compared to using annual counts, but does not disappear.
  6. Trailing 13-month average month-end headcount is a good approximation of annual average headcount, but at the cost of losing valuable data during the earliest periods of a company or employe segment's history.

To do this, I'll build models of 2006 to 2018 headcount for a fake company where we know the exact daily headcount under three different headcount growth patterns: linear, exponential, and logistic. I'll examine the error in annual average headcount approximations (and thus annual attrition rate approximation) using headcount snapshots from the start, middle, and end of the period, as well as the average of the start and end of the period, then comparing the snapshot method to the exact average headcount method.

Using the three headcount growth models, I'll estimate the signed error and signed percent error in average headcount projection, headcount cost projection, headcount cost as share of total budget, and the regrettable attrition rate. The model assumes a $100K per year salary, and that actual annual headcount costs are 70 percent of the total annual budget. We further assume that the annual attrition rate is a constant 20 out of 100 employee-years, which we assess by sampling the number of terminations each day by from a Poisson distribution with a rate parameter equal to 20 percent divided by the average number of days in a year. Finally, we assume that half of attrition is regrettable, in the sense that you wish that employee wouldn't have left.

Annual headcount and regrettable attrition rate approximations

Below is the daily headcount for each of the three growth pattern models. The linear growth pattern is... well... linear. The exponential growth pattern shows the increase in day-to-day headcount rising over time. The logistic growth pattern shows the increase in headcount accelerating at first before slowing down as headcount reaches an assumed headcount capacity of 1,000 employees. The three scenarios have compound annual headcount growth rates of around 18 to 20 percent, which is realistic for a successful mid-level start-up.

No alt text provided for this image

Next, look at the annual average headcount approximations compared to exact employee-years (i.e., exact annual average headcount). Notice how the period start headcount consistently under-estimates monthly average headcount and the period-end headcount over-estimates average headcount, whereas the mid-period headcount (and its approximation from the average of starting and ending headcount) is quite close to the exact average headcount under all three growth pattern scenarios.

No alt text provided for this image

Next, check out the signed errors of the annual average headcount approximations, defined as the difference between the approximate and the exact average headcount.

No alt text provided for this image

Notice how the errors are constant under linear growth because the same number of employees is added every day. In contrast, the errors expand under exponential growth because, with each passing day (thus each passing year), the within-period headcount growth curve becomes less and less like a straight line, and a greater and greater number of headcount is added. Mid-period headcount continues to perform better for the exponential growth case than the period start and period end methods. The pattern of average headcount approximation error under logistic growth shows how the errors will be greatest during periods when headcount is changing most rapidly.

Next, translate the headcount errors into headcount cost errors by multiplying the average headcount approximation by the annual salary of $100K. The results are striking.

No alt text provided for this image

For example, in the exponential growth pattern case, even if you could perfectly predict year-end headcount in 2015, you'd over-estimate total headcount cost by five million dollars. By 2018, you're over-estimating total headcount cost by almost twice that. Under logistic growth, your period-end headcount method over-estimates headcount cost by the widest margin while your headcount is changing the most rapidly. If you had used the mid-period approximation or, better yet, exact average headcount, your headcount cost projection errors would be negligible.

Now look at the headcount approximation errors as a percentage of total budget, where the actual headcount cost is assumed to be 70 percent of the total budget. Do you like over-projecting headcount budget share by over five percent? Then if your headcount is growing exponentially, you should totally use the year-end headcount approximation! I kid.

No alt text provided for this image

Next, look at the percent error in our estimate of regrettable attrition. Considering that regrettable attrition is a key indicator of your company's ability to satisfy and retain talent, I'm guessing you aren't interested in over- or under-estimating how well or how poorly you're doing by 10 percent under exponential headcount growth. I'm also guessing you don't want to be even more wrong about regrettable attrition during periods of rapid headcount change (see the logistic growth case), or during the early periods in your company's history when annual headcount increase is small relative to attrition risk (see the early years in both the linear and logistic growth cases).

No alt text provided for this image

Approximating annual average headcount with average monthly headcount

So far we've looked at headcount approximation errors when using headcount snapshots at different points in the year. Yet you can better approximate annual average headcount by averaging monthly headcount within the year. As the picture below shows, the monthly average based on month-start and month-end averages are much closer to average headcount than their annual snapshot equivalents. Indeed, at this scale, all four approximations are indistinguishable from one another.

No alt text provided for this image

Looking at the signed error of the approximations, however, we see that the average of mid-month headcount performs the best, as expected.

No alt text provided for this image

About that trailing 13-month average of month-end headcount...

Recall that analysts sometimes compromise between business leaders' misunderstanding of attrition rates and the need to estimate average headcount by taking the trailing 13-month average of month-end headcount. Below, we compare the trailing 13-month average method to the mid-year headcount, showing that the trailing 13-month method is a better approximation. This is no surprise since it is averaging monthly headcount rather than relying on a single estimate from the middle of the year. The trailing 13-month estimate tends to underestimate headcount because it uses month-end headcount from the final month in the previous period.

No alt text provided for this image

When we compare the trailing 13-month method to trailing 12-month month-end headcount, we see that the trailing 13-month method is by far the better choice. Apparently, including the month-end headcount from the previous year counteracts the bias of using month-end headcount averages.

No alt text provided for this image

Lastly, we compare the trailing 13-month average of month-end headcount to the annual average mid-month headcount. Amazingly, the trailing 13-month average performs about the same as the annual mid-month average.

No alt text provided for this image

The difference between the two methods is in fact quite small. The methods are so similar for a curious reason, which I explain in steps:

  1. The month-end headcount from the final month of the previous period is very similar to the month-start headcount from the first month of the current period.
  2. For the same reason, the month-end headcount of the previous month is very similar to the month-start headcount of the current month.
  3. This means that the trailing 13-month average month-end headcount is very much like averaging across the month-start and month-end headcount for the even months of the current period along with the month-start headcount from the first month of the period.
  4. The calculation described in step 3 is very similar to taking the average of month-start and month-end headcount each month as the approximation of the mid-month headcount.

Yet you may have noticed something is missing from each of the charts above. Specifically, the trailing 13-month method has no estimate for the first year in the dataset because there aren't enough months to computing a 13-month trailing average until January 2007 (the second year in the dataset).

This is a problem for at least two reasons:

  1. New companies can't compute attrition rates for their first year. That means companies of any age can't compute attrition rates for the earliest periods in their history.
  2. Different departments or other groupings of employees within a company can't compute attrition rates for their earliest periods, either.

These two issues limit our ability not only to estimate attrition rates for the companies and employee segments of interest. They also limit our ability to build models that combine data from multiple companies (even nows) and employee segments within companies (again even new ones) to predict future attrition risk. We could impute (that is, fill in) the missing years with some other estimate (such as the annual average mid-month average), but then our measurement method and its biases are inconsistent, thus harder to analyze.

Maybe we should just stick with annual averages of mid-month headcount or, better yet, the gold standard of annual average daily headcount.

Take home messages

  1. Use the exact period average headcount whenever possible, such as by averaging daily headcount within the period.
  2. If you can't take the daily average headcount, take the average headcount for the most granular temporal resolution available within the period, such as monthly headcount.
  3. Mid-period counts are the most reliable estimate of period average headcount.
  4. Cohort attrition rates a very special case of period attrition rates, not vice versa.
  5. You can estimate the probability of termination from the attrition rate using a simple formula.
  6. There is a lot more to learn about attrition measurement, so stay tuned!

All of the code used to produce the charts in this article are in this GitHub Gist.

Mark Permann

Data Analytics & Visualization Specialist | Tableau Developer | Engineer of CEO-Ready Business Intelligence @NBCU, $4M/Yr Profit Science @AmEx and So What? KPIs for UBI

3 年

I confess I have not digested your essay thoroughly - and that's only because you have extensively documented the deeper thinking this topic deserves. I came to it believing that employee-periods is the right denominator for a rate calculation, as you state - and I will surely return for your "why it matters" when I need to defend that with others ??

回复

要查看或添加评论,请登录

Ben Hanowell的更多文章

社区洞察

其他会员也浏览了