Why #PeopleAnalytics should NOT be using regression to predict team outcomes

Why #PeopleAnalytics should NOT be using regression to predict team outcomes

Did you know that regression (or any GLM model) assumes that the Y value observations (outcome or dependent variable) are independent? If they're not, your coefficients can be inflated and your predictions inaccurate.

For example, say you're trying to find out what drives performance, engagement or retention in your teams; in other words, your dependent variable is performance, engagement or retention.

Now ask yourself a question: are the engagement scores of employees within a team independent of each other? Of course not: people in a team influence each other. For example, if Lao is unhappy about some aspect of work and disengaged, there’s a reasonable chance that other members of that team will feel the same and therefore their scores are not independent as required by GLM regression.

Or consider another example: if a team leader's behaviour affects one team member’s performance, chances are this behaviour is influencing performance across the whole team. Therefore the team members’ ratings are not independent and you are violating a key regression assumption which will result in inaccurate predictions and poor recommendations to the company.

I discovered this vital fact when researching the drivers of couple romantic relationship satisfaction (a team with n=2) and quickly learned that if one partner is dissatisfied with the relationship, there's a good chance the other will be dissatisfied as well. Thus the scores are not independent and regression will deliver inaccurate predictions. How did I fix it?

The solution is to use a technique called Multilevel Modelling (MLM) which doesn't assume that outcome variables are independent; it's certainly the best way to get accurate results and make good recommendations if you're working with team data. In fact, I won't do any #PeopleAnalytics team data project without it.

MLM has been in the news a lot in the past few years because political analysts used it to successfully predict Trump's election wins and Brexit. Before this, they'd been using regression and getting their forecasts wrong.

If you want to avoid the errors made by political forecasters and make accurate predictions and valid recommendations about what drives team performance in your organization, move away from GLM regression to MLM.

More information available on request.

Bernie Caessens

Lead scientist @ Poolstok

4 年

Key observation! There are ways to deal with this in GLM, but you ALWAYS lose information or increase your data need. See here for a discussion: https://stats.idre.ucla.edu/other/mult-pkg/introduction-to-linear-mixed-models/. In addition, also, use the proper link family: binomial, poisson, logit, and so on when you CAN use glm, and look at your model details, residuals ......and so forth. Excellent point Max!!

Simon Haines

Partner – Head of Talent Analytics

4 年

Great analysis as ever Max. Great illustration of why a causal model (as per Simply Get Results toolkit if you’ll forgive the plug) adds an important counterweight to assumptions based only on statistical significance. The combination of causal analysis and correlation should serve to increase confidence.

Absolutely agreed, Max. Thank you for bringing this important issue to the people analytics community! Teams, and networks of teams, are extremely common in organizations. They're how most 21st century work gets done. As well, dyads like leaders and followers are more than just roles - they're connections that influence both people over time. We don't consider these enough in our analyses or the design of our workplace decisions and processes. Your relationships research sounds so interesting! In my MLM class, the dyad models were always hardest for me to wrap my head around. Somehow teams were much clearer! : )

Hagan Risner

People Analytics | SoFi

4 年

Given the hierarchical nature of organizations (e.g., teams within business units within regions...etc.) shouldn’t this be the default approach to most organizational problems requiring a regression-based model for the analysis portion of a problem-solving endeavor? More importantly, how would one efficiently decide between the effectiveness of using a single-level GLM, or a fixed-effects GLM, and using a multi-level approach? Given the time constraints of most business pressures indicated by senior leaders, isn’t there a premium placed on time-to-insights? So, how does one justify to leadership the additional time (and assumptions) it would take to interpret multi-level model results and to identify the proper distributions from which the characteristics of each cluster arises?

要查看或添加评论,请登录

Max Blumberg的更多文章

社区洞察

其他会员也浏览了