People analytics and event history analysis

People analytics and event history analysis

Or: How I learned to stop measuring time to fill and love cumulative incidence functions

This article is an adaptation of a discussion I led with the Seattle People Analytics Forum hosted by the Seattle People Analytics Network at Nordstrom headquarters on 7 October 2019. The slides I used during that discussion are here, and the source code for the slides is here. Thanks to the organizers for the opportunity, and to the participants for their comments and questions.

Does your recruiting organization use the average time to fill a job requisition as a measure of recruiting performance? Does your talent management organization use the average time to promotion as a measure of upward mobility in the company? What about the average time to employee termination? I'm guessing that multiple organizations at your company use it to measure employee retention.

It turns out that many companies miscalculate these time-to-event metrics, which makes them more likely to make poor business decisions. In light of recent lawsuits stemming from bad metrics, maybe you aren't surprised anymore by the idea that businesses miscalculate. So let's play the controversy game at higher stakes.

I'll let Morpheus lay it out for you:

Morpheus tells it like it is about time-to-event metrics.

No matter how good your arithmetic is, and no matter how clean and well-documented your databases are, many of the average time-to-event metrics that your company uses are an attempt to estimate a quantity that is fundamentally unknowable. For that reason, the metrics are utterly meaningless.

?? But before we poop all over cherished people metrics...

We use these metrics for a reason. Let's respect what came before by trying to understand why we use them now.

  1. Easy to understand - The average time to fill a job requisition conveys a message that is easy to understand: "About how long will it take until this req gets filled?"
  2. Easy to calculate - The easiest way to calculate the average time to an event is to... take the average of the time it took for that event to occur. That's pretty easy.
  3. It's what people want to know - This is the most common piece of feedback I get whenever I've brought it up at work. The average time to fill a req is what hiring managers what to know so they can get a ballpark estimate of how long it will take to get a butt in the seat. The average time to promotion is what job candidates want to know so they can decide if there is enough upside to joining your team.

Yet let's think more deeply about these claims.

  • Easy to understand. Easy to calculate. ?? Should we do things just because they're easy? No, not if it makes us more likely to make poor business decisions.
  • It's what people want to know. ?? Is it? We'll see that when people ask for the average time it takes for an event to occur, they often have a more specific question in mind. Yet they ask for the time-to-event metric because they believe their more specific question can't be answered with the data available. Ironically, it's the time-to-event proxy metric that addresses an unanswerable question.

Here's what we're gonna do

  1. First, I'll explain how companies miscalculate time-to-event metrics assuming that those metrics aren't utterly meaningless.
  2. Then I'll relax that assumption, explain why the time-to-event metrics are often unknowable, and describe what we can do instead of measuring the average time until an event occurs.

How companies miscalculate the average time until an event occurs

Let's use the example of the average time to fill job requisitions. The most common problem I see is when companies use only filled requisitions to calculate time to fill. To illustrate why this is a problem, I'll use a toy example.

Suppose it's the end of 2018, and there were three job requisitions open at any point during that year. Below is a table of the requisitions with their open dates, fill dates, and time to fill. Fill dates and time to fill are left blank if the requisition hasn't yet been filled, as is the case for Requisition ID #3.

Three fake requisitions for a fake company.

Let's throw out the un-filled requisition, because we don't know the time to fill for that one, right? Here's the new dataset.

Two fake requisitions from a fake company.

To calculate the average time to fill, we just average the time to fill between these two requisitions, in which case the average time to fill is just two months. Assuming that the rate at which requisitions were filled was constant during 2018 (the second of two problematic assumptions we're making), the monthly fill rate is the inverse of average time to fill. In our case, then, half of a job requisition is filled for every full month that a requisition is open. To get the annual fill rate, we multiple the monthly fill rate by 12 months, which tells us that we fill six job requisitions for every full year that a requisition is open.

According to this analysis, our recruiting performance is AWESOME.

No alt text provided for this image

Aren't we forgetting someone? Requisition ID #3, although not yet filled, has been open for 14 months, and 12 of those months were in 2018.

No alt text provided for this image

We can use this transformed version of the original dataset to compute the actual monthly fill rate following the definition of a proper demographic rate, which in this case is:

No alt text provided for this image

The number of requisitions filled in 2018 is two, and the total number of months that requisitions were open during 2018 is 16, making the monthly fill rate 2/16 = 0.125. Again assuming that the fill rate was constant in 2018, the average time to fill requisitions is the inverse of the monthly fill rate, which is 1/0.125 = 8 months.

Let's review:

  • The average time to fill job requisitions is 8 months, not two months.
  • Closing a job requisition takes 4x longer than we are telling hiring managers.
  • ?? No wonder they're pissed.

In general, if you use only cases where an event occurred to estimate the average time until that event occurs, you will underestimate the average time to event, and you will overestimate the rate at which the event occurs. So don't do that.

Introducing event history analysis

At this point, analytically-savvy readers be like:

No alt text provided for this image

For the rest of you scratching your heads, survival analysis is a useful way to analyze time-to-event data where not all of the events of yet occurred, a phenomenon known as right-censoring. One benefit of survival analysis is that you aren't including only information about time to event based on cases where the event occurred. In addition, you can relax the assumption that the rate of event occurrence is constant over time.

To illustrate by example, suppose we have data on the date that employees were hired and, if applicable, their date of termination. Using this data, we can estimate the survival function, which gives the probability that an employee will remain at the company beyond a specified period of time. To make this concept more concrete, here's a picture of a survival function. Each point along the curve gives the fraction of employees remaining at a time point just beyond each month since hire.

No alt text provided for this image

It turns out that the average time until an employee is terminated is equal to the area under the survival curve, as illustrated below.

No alt text provided for this image

Another quantity of interest in survival analysis is the cumulative incidence function, which gives the probability that an employee will be terminated by a specified period of time. Below is the cumulative incidence function associated with the survival curve we showed before.

No alt text provided for this image

In general, the cumulative incidence function gives the probability that an event will occur after a specified period of time has passed. For this reason, we can use it to address important questions in people analytics about the chance that something will happen by a target date. For example:

  • Hiring managers might ask, "What's the chance that I'll fill this job opening in time to finish a project that requires a new hire?"
  • Finance managers might ask, "What's the chance that we'll meet our headcount goals by mid-year?"
  • Job candidates might ask, "What's the chance I'll get promoted within a year?"

In other words, cumulative incidence functions provide information that people actually want to know. In all of these cases, the average time to event is a quantity that someone would use to guess at the answer to their real question. I know this because these are examples of questions people have asked just before they requested average time-to-event metrics from me.

The cumulative incidence function might sound familiar if your organization reports metrics like the percentage of requisitions that will be filled by a range of arbitrarily-defined time points (e.g., a week, two weeks, 30 days, 60 days, a year; I call these N-day metrics for their focus on an event occur within N days). The strength of estimating the cumulative incidence function is that you can estimate the chance the event will occur by any time period, not just the ones you happened to calculate. In addition, unlike the N-day metrics, you wouldn't have to build a separate prediction model for each time horizon. Instead, you can build statistical models that estimate how the entire cumulative incidence function varies across person-time, calendar-time, space, and other predictors.

Comparing cumulative incidence functions to estimate impact

Suppose that in one business market, we took some action we'll call X to increase retention (thus decrease attrition, and the height of the cumulative incidence function). Below, we compare the cumulative incidence function in that business market to a similar business market where we did nothing. Given that the cumulative incidence function for doing X is lower than if we did nothing, our experiment was a success. But what does that mean for the business? What do you tell the board?

No alt text provided for this image

If we take the area between the two cumulative incidence functions, it tells us the number of months of labor per employee that we lose if we don't do X. In this case, we lose a year and two months of labor per employee by not doing X. You could aggregate this figure across projected new hires to estimate turnover costs.

No alt text provided for this image

The cumulative incidence function C(t) is easy to calculate as the complement of the survival function S(t):

C(t) = 1 - S(t)

That is, unless there are competing risks.

What are competing risks?

Suppose you not only want to know about termination in general, but you want to measure the risks of voluntary vs. involuntary termination, or regrettable vs. un-regrettable termination. If an employee terminates voluntarily, then (barring an analysis that extends to consider returning employees) that same employee cannot be terminated involuntarily, and the same for regrettable vs. un-regrettable termination.

For another example, whenever we do analysis of time to first promotion, we must always consider the competing risk of termination. In large companies where internal transfers are frequent and even encouraged, we need to also consider the competing risk of first internal transfer.

Time to event is usually unknowable when there are competing risks

To illustrate why, I'll use the example that compares the risk of first promotion to the competing risk of termination. If I get terminated before my first promotion, you will never know the time it would have taken me to get promoted. You could guess how long it would have taken by looking at how long other people took to get promoted assuming I'm just like them. Yet because I was terminated before I was promoted, I'm probably not like them.

In general, if the reasons that two competing events occur are not independent, then the average time to either event is unidentified, meaning it cannot be estimated from data. Competing risks are dependent when one event is less likely to occur for reasons that make the other event likely to occur, or when both events are likely to occur for similar reasons.

Because average time to event is unidentified under dependent competing risks, the metric is meaningless.

No alt text provided for this image

Okay. I know what you're thinking.

No alt text provided for this image

Cause-specific cumulative incidence functions to the rescue

Recall that cumulative incidence functions give the chance that an event will happen within a specific period of time, addressing the questions that often drive people to ask for the average time to that event. Cause-specific cumulative incidence functions are basically the same thing, but they condition the occurrence of the event at time t on having persisted through all competing events up to that time. By conditioning on persistence through all competing events, the cause-specific cumulative incidence function makes no assumptions about the independence of competing risks.

For example, the picture below shows that the cause-specific cumulative incidence of voluntary termination at 100 months is just under 20%. For involuntary termination, the cumulative incidence by the same time point is just under 30%. These figures can be directly compared.

No alt text provided for this image

Cause-specific cumulative incidence functions add up to the total cumulative incidence of any event. This makes it possible to compare the risks of competing events in a natural way. The picture below shows the cumulative incidence functions for voluntary and involuntary termination stacked atop one another. The area of the colored bands represent the number of months of labor per employee that are lost to a given type of termination. Yes, I know I switched the colors of voluntary and involuntary termination from the last plot. So sue me.

No alt text provided for this image

Limitations of cause-specific cumulative incidence functions

Beware when formulating a competing risks analysis for the following reasons:

  1. Events must be mutually exclusive - If the way you define events means they could both happen at the same time, then the cause-specific cumulative incidence functions will not sum up to the total cumulative incidence. Often, the solution is to redefine events so they are mutually exclusive, or to define the intersection of events as another competing risk. Otherwise, you will need to extend beyond basic competing risks models.
  2. The event must eventually happen - Most event history analysis assumes that the event will happen at some point. Yet in some cases that might not be true. For example, some job requisitions remain open even though they will never get filled because the recruiters forgot to cancel them. In this case, the estimates of cause-specific incidence for filling and cancelling need to adjust for the unknown probability that the requisition will never be closed. Cure models are one extension of event history analysis that can handle this situation.
  3. The event cannot repeat - If the event can repeat (e.g., when you're interested in all promotions or internal transfers, not just the first ones), again the cause-specific cumulative incidence functions do not add up to the total cumulative incidence. Yet cumulative count functions are an extension of cumulative incidence functions to handle recurring events.
  4. Employees can't transition back and forth between states - Job requisitions can go on hold and then no longer be on hold. In many cases, your estimate of the probability of filling a job requisition needs to account for transitions between active and inactive states. Multi-state models are a useful extension of event history analysis for such cases.

So here's my question

Given that the average time to fill is often miscalculated and even meaningless, how could you leverage cumulative incidence functions to address the business questions your organization has about how often events happen and how long it takes them to occur?

And here's another question

Have I piqued your interest? Would you be interested in learning more about how your organization could use event history analysis for people analytics? Let's talk. Or maybe... let's read. I'm thinking about writing a book about event history analysis for people analytics. Would you read it?

AJ Maguire, PhD

Senior Manager of Diversity, Equity, and Inclusion Research

5 年

I would love to read a book on this!?

回复
Kelsey Cantea

Compensation at Bumble

5 年

Great post, Ben. I would definitely read a book that covers more examples like this one, very interesting!?

要查看或添加评论,请登录

Ben Hanowell的更多文章

社区洞察

其他会员也浏览了