#PeopleAnalytics Data versus Models: Which is more important?

#PeopleAnalytics Data versus Models: Which is more important?

Introduction

A couple of weeks ago, I argued that GLM regression leads to poor workforce decisions when working with team outcome data because GLM assumes that team performance outcome measures like performance, retention and engagement are independent of each other. Since team outcomes are seldom independent, however, a lot of workforce decisions based on GLM regression models are therefore invalid.

Multilevel modelling (MLM), on the other hand, makes no assumptions about data independence and should, therefore, be the tool of choice when analysing clustered data (teams, regions, business units, etc.).

During the ensuing discussions, one analyst noted that in their experience, improving data quality offered a more valuable return than using a statistical model such as MLM. Is this a reasonable claim? 

People Analytics decisions = Data + Model

Most worthwhile People Analytics projects (as opposed to, say, relatively low-value descriptive dashboards and visualization) involve entering workforce data into a statistical model and using the resulting output to facilitate making decisions about how best to improve people processes and organizational levers (see Figure 1)

Figure 1: Decision quality = Data + Model

No alt text provided for this image


Figure 1 makes it abundantly clear that the quality of People Analytics decisions is a function of two things: 1) input data quality AND 2) statistical model quality. 

To put it another way, perfect data in a poor statistical model will not result in high-quality decision-making; nor will poor quality data in a perfect statistical model (fondly referred to as the GIGO principle: Garbage In, Garbage Out).

Debating the importance of data versus model, therefore, is a bit like debating which piece is most important in a jigsaw puzzle: the answer is that you won’t win any prizes unless you have all the pieces in the right place.

Why then do some analysts still resist embracing new statistical models? At this point, you may optionally want to skip to the Appendix at the end of this article entitled A Brief History of Resistance to Models.


Reasons for resisting new models in People Analytics

There are probably a number of reasons why some People Analytics practitioners resist new models, but here are the two that I encounter most often:

Model Resistance Reason 1: Living in a ‘Data Groundhog Day’

For those of you that read the optional appendix, you'll know that People Analytics is not without its own ‘Churches’ offering doctrines to the ‘faithful’:

Back in 2012, Josh Bersin outlined what he called a Four-Level Maturity Model for HR Analytics (See Fig 2).

Figure 2: Berin’s Four-Level Maturity Model for HR Analytics

No alt text provided for this image

Bersin’s advice back in the day was that People Analytics functions should invest their first few years collating and preparing their data for what he called “Level 1 Reactive – Operational Reporting”. These are basically flat reports, dashboards and visualizations. 

The benefit of this early investment in data preparation is that by the time these functions were ready for statistical modelling phases (Levels 3 & 4), their data would be in great shape hence allowing them to avoid GIGO (Garbage In, Garbage Out).

The required data preparation skills would have included, for example:

  • Data integration
  • Exploratory data analysis
  • Missing data analysis
  • Data transformation
  • Outlier analysis
  • Imbalanced data management
  • Dashboards
  • Simple reporting
  • Visualization

While many practitioners acquired these data preparation skills, this is where their statistical education sadly ceased as many of them never went on to acquire the statistical modelling skills required for Bersin's Levels 3 & 4.

One reason for not acquiring these skills, perhaps, is that statistical modelling skills are conceptually significantly more difficult to acquire, and usually require formal statistical education. Typically, statistical modelling skills include:

  • Experimental research designs 
  • Statistical model selection
  • Appropriate test statistic selection
  • One or two-tailed test selection
  • Model assumption checking 
  • Model feature selection
  • Interaction hypotheses
  • Output interpretation

And the above is just a partial list because practitioners will also need to acquire additional skills for the specific model chosen.

Relatively few People Analytics practitioners, therefore, went on to gain these skills and thus today, most of their time is spent on relatively low-value simple data-oriented activities such as data warehouse development, dashboards and visualisations. 

Unfortunately for these data-oriented People Analytics functions, however, no amount of data preparation, reporting and visualisation can compensate for the lack of valuable insights that statistical modelling would have offered.

One could say that these People Analytics functions are caught in a perpetual ‘Data Preparation Groundhog Day’.

(Dave Millner, the HR Curator, says I should point out that Bersin has rolled away the old model and that a new one has arisen)


Model Resistance Reason 2: Continuous learning is a lot of work

Although some People Analytics practitioners stopped at data preparation, a small group did acquire the skills required for Bersin’s Levels 3 & 4 statistical modelling. In reality, however, most of this group already possessed these statistical modelling skills when they were hired having previously acquired them at University (because statistical modelling skills are difficult to obtain on-the-job).

Why then do some people with statistical modelling skills still resist new models? Quite simply because statistical modelling skills require continuous updating. To put it another way: the statistical model that gave you a competitive edge three years ago ceased to be a competitive differentiator when your competitors started using it as well. To remain competitive, you’ve got to constantly update your People Analytics modelling with skills like MLM, machine learning and AI.

And that’s the catch: many statistical modellers simply lack the time or energy to continuously update their statistical modelling skills. Thus when new models become available, they resist embracing them simply because they don’t properly learn and understand them.

Conclusion

To remain competitive, companies must improve the quality of decisions about their most important resource: their people. 

Competitive workforce decision-making requires both better data and better statistical modelling than your competitors. While most organizations do a reasonable job with data preparation, many are falling behind in the use of the latest statistical models. 

To address this, we recommend that People Analytics leaders encourage continuous learning and updating of their teams’ statistical modelling skills.



Optional Appendix: A Brief History of Resistance to Models

Thomas Kuhn would argue that People Analytics is certainly not the only profession which prefers hanging onto old (possibly faulty) models rather than investing in new improved versions. Consider the following brief historical interlude:

The geocentric model

In the 16th century, most models of our solar system were geocentric: that is, they had the earth in the centre with the sun, moon and planets revolving around it (Fig A). Having earth in the centre aligned well with the Church doctrines of the day which argued that human beings must surely lie at the centre of all Creation.

Figures A & B: The Geocentric and Heliocentric Models

No alt text provided for this image

Courtesy Wikipedia Foundation and mrhamel.com

This geocentric model, however, had one minor flaw: it was completely wrong. 

For example, not only did it seldom predict the correct positions for planets, but it failed to explain why many planets move backwards in their orbital rotation (retrograde motion). 

These inaccuracies led to poor agricultural decision-making in much the same way that GLM regression leads to poor workforce decisions in team environments.

To compensate for these errors, mathematicians attempted to add complex mathematical formulas to the geocentric model in the hope of improving its accuracy: sadly, none were entirely successful.


Birth of a new solar system model

In 1534, Copernicus formally proposed replacing the faulty geocentric model with the heliocentric model still in use today. 

The heliocentric model places the sun rather than the earth at the centre of the solar system with all planets revolving around it (Figure B). Not only did the heliocentric model address the limitations of the geocentric model, but it was much simpler too. 

Reaction to the heliocentric model

You’d have thought everyone would be delighted with a new simpler more accurate model for agricultural decision-making (in much the same way that MLM improves workforce decision-making), but you’d be dead wrong.

Instead, there was an enormous outcry from traditionalists whose livelihoods and projects were heavily invested in the old geocentric model - never mind that it was inaccurate. One can almost hear the traditionalist’s argument: “Developing better input data for a geocentric model would offer more valuable returns than trying to perfect a heliocentric model”.

That wasn’t the end of it: so unhappy was the Church that human beings no longer occupied pride of place in the centre of the “universe” that when Galileo endorsed the Copernican heliocentric model, he was sentenced to life imprisonment for heresy in 1633.

The moral of this story is that since time immemorial, there have always been people who resist change and People Analytics is no exception.

Vladimir Dimitroff

Experienced senior executive and management consultant

4 年

"The Egg, of course!", said the Chicken ?? Which means: Data. With enough data even a most primitive linear extrapolation (or a reactive operational report at the lowest Bersin maturity level) is better decision support than nothing. On the other hand, even a most brilliant model is utterly useless without data? This is obviously tongue-in-cheek and I am not disputing in any way the article (great piece, btw, Max - congrats! ??) - but the grain of seriousness in the joke is that some ingredients of the perfect mix may have a bit of a priority. Not so much importance priority, as chronological priority - more equal among equals on the timeline. I'd invest time and talent to get my data right (governance-wise, acquisition- , organisation- , quality- and conditioning-wise) before I begin to start commencing any attempts to model anything. Rubbish data = rubbish models ia another old proverb to remember.

Ben Hanowell

Director of People Analytics Research, ADP Research. I study the decisions of employees and employers. My posts reflect my own thoughts.

4 年

Bless you

Tomeka Hill-Thomas, PhD

Global HR Executive | Business Strategist | Data Scientist | People Analytics | PhD Labor Economist | Speaker | Author

4 年

Great article!

Bernie Caessens

Lead scientist @ Poolstok

4 年

First, I agree with you Max that we should aim for sound methodology and high quality data to arrive at better quality decisions, thus, to actually achieve the roi of people analytics. However I keep milling over the practitioner question: Would we rather take decisions on a well-performed GLM, interpreted in the correct way, given that we may have made erroneous (or oversimplified) assumptions or drop the analytical part altogether because of the statistical complexity? Wouldn’t it be better to continuously improve on our models, but at least start with an understandable and well-documented approach, gradually gaining more insight because of improvements in data and models. Your analogy with the heliocentric model, in my view, doesn’t really hold in this case. The latter deals with a model, not a method. For people analytics, the heliocentric model analogy would be more closely related to cognitive/behavioural models that try to explain people’s behaviour. Here, I agree that using wrong models and their measurement should be avoided (e.g. MBTI measurements of personality) at all time. With this in mind, using a ‘good’ model, one can get some part of way using the family of GLM approaches given it’s limits :-)

要查看或添加评论,请登录

Max Blumberg (JA) ????的更多文章

社区洞察

其他会员也浏览了