You need more than the "raw data"? (HR: A demographic approach, Part II)

You need more than the "raw data" (HR: A demographic approach, Part II)

In this series of posts called "HR: A demographic approach", I'll describe and correct some of the common pitfalls of human resources data analysis. You'll learn that human resources departments should start measuring outcomes such as recruitment and attrition more like how demographers and epidemiologists measure birth, death, survival, immigration, marriage, and other population change rates. You'll also learn about the difference between period and cohort rates, and how companies often confuse the two.

In this second part of the series, I cover another of my pet peeves: when I hear people say they just need the "raw data" on attrition to make decisions.

In Part I of this series, I taught you how to measure an attrition rate properly. Then you went and calculated as many attrition rates as you possibly could, I'm sure. Well, this post will disappoint you, because from it you will learn that:

  1. Observed attrition imperfectly measures the true attrition risk that you can't observe
  2. Raw attrition rates under- or over-estimate attrition risk for small headcount
  3. Building dashboards that report raw attrition rates creates the illusion of learning about attrition and how to deal with it

By learning these things, you will hopefully become convinced that simply asking your nearest business analyst for a spreadsheet of attrition rates broken out by job family and location isn't the best way to learn about how to deal with attrition risk at your company. It's probably one of the worst ways.

Observed attrition rates measure the true attrition risk you can't observe

Most meetings about attrition concentrate so much on the numbers that we often forget how attrition is the result of a complex social process.

Consider regrettable attrition due to employees leaving the company. Events occur that cause some employees to start thinking about leaving the company for one reason or another. Then more things happen that cause employees to actually start looking for another job. The attributes of those individual employees combine with the status of the broader labor market to determine whether or not they get screening interviews, then on-sites, then offers they're willing to accept. Finally, they make their decision to leave... or not.

The outcome is uncertain. For that reason, if they could do it all over again in some alternate reality where just a few things happened differently, maybe your employees would make different choices. Even if everything that matters most to attrition decisions went exactly the same, they may still have made different decisions because humans are so fickle. And if those employees made different decisions, you would observe different attrition rates.

The attrition rates you observe are only a window into an underlying social and cognitive process so complex and multi-faceted that you cannot possibly hope to observe it directly. You are left to estimate what that underlying, unobservable attrition risk is (and what it might be next year, and what causes it) through the analysis of data.

Even if you can measure the attrition rate from a full census of all employees every single day of the year, the attrition rate you measure is subject to error. Some of that error is called sampling error. It arises from the fact that the reality we all live in has sampled only the employees who ended up in your roster through the complex social process of recruitment. Reality also sampled only the decisions that those employees eventually made about whether or not to leave the company. As we'll see, the smaller your headcount, the greater the sampling error.

This many-worlds perspective of attrition risk is way spookier and more fascinating than just dividing the number of terminations by the average headcount and calling it a day.

It's also more correct.

The smaller the headcount, the greater the attrition rate sampling error

Suppose a tech company founded in Seattle has technical account managers spread across nine markets. Headcount is higher in larger markets and markets added earlier in the company's history, leading to the mean annual headcount distribution below.

No alt text provided for this image

Suppose that the true mean attrition risk across markets (which, remember, we cannot observe directly) is 25 out of every 100 employee-years, and that the true attrition risk for a given market is drawn at random from a distribution around that mean with a variance of five (in this simulation we will use a gamma distribution). In a given year, the attrition risk is realized as terminations that are drawn from a probability distribution that is defined by the market-level true attrition risk (in this simulation we will use a Poisson distribution).

Now suppose that we could repeat this simulation 10,000 times (including the sampling of market-level true attrition risk) and calculate for each market the mean absolute percent error of the observed attrition rate compared to the true attrition risk. The picture below shows that the smaller the mean annual headcount in a market, the larger the error. In our smallest market (St. Paul with two employee-years), our observed attrition rate is off by over 125 percent on average. Even in a mid-sized market like Denver (22 employee-years), our estimate of true attrition risk is off by nearly 50 percent.

No alt text provided for this image

The reason for this result is sampling error. The smaller your sample, the more likely you are to observe extreme values. When the attrition risk is sufficiently smaller than one termination per employee-year (as is the case for most companies and in this example), the most likely extreme value is zero attrition, as the picture below shows.

No alt text provided for this image

To put this in perspective, in the smallest market of St. Paul (two employee-years), we would have observed an attrition rate of zero in about 75 percent of cases when the true attrition risk was in fact somewhere between 10 percent and 34 percent (the theoretical inter-quartile range of the true attrition risk distribution).

Even in the mid-sized market of Denver (22 employee-years), there's about a 10 percent chance we'd see zero terminations even though the mean true attrition risk in this market across all simulations is, as expected, about 25 out of a 100 employee-years. If you measured zero terminations in Denver in the current year and formed a prediction that next year's attrition rate would therefore be far below 10 percent, then you'd be dead wrong almost 80 percent of the time. Almost a third of the time, the next year's attrition rate would be greater than the true company-level attrition risk!

So if you like being dead wrong even with mid-sized headcount about both the current and future risk of losing employees, then by all means keeping asking your business analyst for the "raw data". If you want to do better, though, I suggest you learn some demography and some probability theory, or hire someone who already knows about that kind of stuff and learn how to talk with them about it. If you are at a major tech company that employs oodles of people with advanced degrees in quantitative disciplines (including the social sciences), you may already have hired that person, and you keep asking them for raw data, and that keeps driving them insane (and possibly to leave your company).

The more you disaggregate attrition rates into smaller and smaller groups of employees, the smaller the headcount in each group, the greater your sampling error, and the more likely you are to get spurious results about how the attrition rate among data scientists named Steve in Santa Fe is so low, but yet is so high among applied scientists who are experts in computer vision who are named Darla.

By the way, everything that has been said to this point about attrition rates applies to any other kind of rate, including promotion rates, internal transfer rates, regrettable vs. un-regrettable attrition rates, job requisition close rates, sexual harassment rates (more human resources departments should compute those), and so on.

Dashboards and the illusion of learning

Don't get me wrong. It's nice to be able to quickly look at tables or charts of some of the key performance indicators for your company. Today, many of us look as these tables and charts in "dashboards", usually created and hosted in some "business intelligence platform", or just a spreadsheet. Unfortunately, too many of us think that by gazing at these tables and charts and having (presumably) substantive arguments about their implications during staff meetings (LOL), we will magically make better decisions.

Here is a quote from a book you really must read? called How to Measure Anything:

A study of experts in horse racing found that as [the experts] were given more data about horses, their confidence in their prediction about the outcomes of races improved. Those who were given some data performed better than those who were given none. But as the amount of data they were given increased, actual performance began to level off and even degrade. However, their confidence in their predictions continued to go up even after the information load was making the predictions worse. -- from How to Measure Anything by Douglas Hubbard

That thing with the horse racing experts (which the book shows is a problem among experts in general) is probably happening in a lot of human resource departments right now, especially in large companies where the number and complexity of business intelligence dashboards (sometimes multiple dashboards built by multiple teams displaying slightly different versions of the exact same data) is exploding.

We need less raw data and more understanding about how attrition is changing (or not), how it varies (or doesn't), and how uncertain we are about those findings. We need fewer breakouts and drill-downs of attrition rates and more estimates of how the specific decisions we make influence attrition, and how influencing attrition affects company performance.

What to do?

If attrition rates are error-prone windows into unobservable attrition risk, why compute them at all? In the next part of the series, we'll look at how we can use probability theory, statistics, and (groan) machine learning (but not artificial intelligence because it doesn't exist) to analyze attrition risk with greater rigor.

All of the code that produced the charts and numbers in this post is here.

要查看或添加评论,请登录

Ben Hanowell的更多文章

社区洞察

其他会员也浏览了