Shapely Values: Game Theory in People Analytics
Scientific disciplines often benefit from seemingly unrelated discoveries in other disciplines. I’ve previously written about how economists Fischer Black and Myron Scholes used a physics heat equation to construct their famous option pricing formula, long a compensation analyst’s toolkit staple. People analytics borrows heavily from various fields, including anthropology, psychology, economics, statistics, and data science, to name just a few. As an economist, I’ve always favored game theory, formalized by von Neumann and Morgenstern in their groundbreaking 1944 treatise, Theory of Games and Economic Behavior.
Game theory is often referred to as the science of strategy. “Everything begins with strategy” is a mantra I recite to students in my Stanford Continuing Studies people analytics course. This is why I was drawn to the use of Shapley Values, a cooperative game theory concept, in explaining attrition – an almost obligatory bedrock analysis for all people analytics teams – by Watercooler, a global Israel-based behavioral people analytics startup I have the privilege of advising, along with other experts and thought leaders such as Simon Sinek .
Game theory has two major branches. In non-cooperative game theory, players compete against each other and cannot or do not coordinate their activities or enter binding agreements with one another. A Nash Equilibrium, popularized by the 2001 film A Beautiful Mind about the life of economist John Nash, is one possible solution to a non-cooperative game. In cooperative game theory, on the other hand, players have the option to coordinate and make enforceable contracts with one another. They can negotiate and form coalitions to achieve a winning strategy against opposing players and coalitions.
Lloyd Shapely, an American mathematical economist and leading game theorist, won the 2012 Nobel Memorial Prize in Economic Sciences in part for the Shapely value, a concept associated with the solution of cooperative games that he introduced in 1951. Data science has adopted Shapely values, and many statistical software platforms offer a ready-made routine to estimate them. For example, in python, the SHAP library (short for SHapley Additive exPlanations) was introduced by Scott Lundberg and Su-In Lee in their 2017 article A Unified Approach to Interpreting Model Predictions.
SHAP is a game theoretic approach to help explain machine learning models' output – i.e., prediction. Positive SHAP values positively impact a model’s prediction, while negative SHAP values have a negative impact. In Interpreting Machine Learning Models with SHAP, Christopher Molnar suggests SHAP is the Swiss army knife of machine learning interpretability. Others have suggested that SHAP is the most powerful method for explaining how machine learning models make predictions.
What is the connection between Shapley values and machine learning? It is all about explaining what each feature contributes to a model’s prediction. In linear models, each feature’s contribution is simply the feature value times the weight of the feature. However, in non-linear models, which are more reflective of reality, interdependencies and interactions among the features make it difficult to ascribe any particular feature’s contribution to a prediction. The interdependency and interactivity among features are akin to the coordination and cooperation among players in a cooperative game! Aditya Bhattacharya, an explainable AI researcher, provides a wonderful numerical explanation of the connection.
In a cooperative game, players form coalitions to cooperate to achieve an objective, say, a profit. This is the total payout from the game. Each player earns a share of the total payout. The Shapely value allows individual payouts to be assigned to each player depending on their contribution to the total payout. While not a trivial mathematical calculation, it is essential to identifying what fair individual payouts should be. A coalition would be hard to engineer if players cannot agree to a fair division of the spoils. SHAP deconstructs a machine learning model’s prediction into a sum of contributions from each of the model’s features. Note that the Shapley value isn’t the difference in prediction when the feature is removed from the model.
SHAP applies to machine learning regressor and classifier models. In both cases, SHAP is applied after the machine learning model is trained and is helpful in interpreting the predictions – a great help when advanced machine learning models seem like a “black box” and interpretability is increasingly an issue in the application and adoption of machine learning and artificial intelligence in general. Let’s look at a familiar people analytics example – explaining the drivers of attrition via a classifier model (someone either leaves or stays) – to see how Shapley values assist in interpreting the model’s predictions.
领英推荐
Watercooler developed a machine learning model employing HRIS and employee digital footprint data on over 150 distinct variables for an organization over a three-year period to predict the drivers of attrition. The HRIS data included typical information on organizational structure, demographics, tenure, time-off utilization, compensation, performance, and other variables for retained and departed employees. Behavioral information from employees’ electronic device digital footprint included meta-data on Slack messages, emails, meetings, and GitHub activity.
The schematic below is a force plot showing the SHAP values of two employees – employee (a) and employee (b) – both with relatively high attrition risk (the boldface number) but for completely different reasons. Aidan Cooper helpfully explains various SHAP plotting alternatives, including waterfall and beeswarm plots). In our force plot, the red sections represent the factors that push attrition risk higher, and the blue sections are the mitigating factors pushing attrition risk lower. The base value is the average attrition risk in the dataset and is, therefore, the same in both plots. Variables with a larger SHAP value (i.e., more impact) have larger arrows. The values have been hidden for simplicity.
Employee (a) has a high likelihood (0.86) of attrition.? The attrition drivers are relatively low compensation, high incidence of off-hour working, long working hours, and a high count of interrupted time off.? The mitigating factors are the employee’s tenure, age, and sufficient one-on-ones with their manager. This pattern reflects potential burnout.?
Employee (b) has a high likelihood (0.82) of resignation but for different reasons.? The resignation drivers are insufficient one-on-ones with their manager, infrequent manager interactions, low tenure (they are a recent hire), and age. Mitigating the risk (and lowering the likelihood of attrition) are the employee’s working hours and low incidence of interrupted time off. This pattern reflects potential failed onboarding.??
The force plot offers an easily interpretable visualization of which factors are at play in different situations. On the Watercooler website, you can read a more detailed version of the example in a short paper entitled Is AI the missing link between Engagement and Retention, written by Or Yair , co-founder and CTO.
What are the takeaways? First, be open to ideas from other disciplines and borrow concepts, equations, and results to advance your specific people analytics projects. Second, don’t forget the immense benefits of good old machine learning as everyone’s attention turns to generative artificial intelligence, casting a shadow on discriminative artificial intelligence. Third, explore Shapley values – in whatever form your statistical package of choice offers them – to open the black box of complex machine learning models and shed light on the contribution of each feature to prediction. Explaining AI models, especially via compelling visualizations, builds comfort with and confidence in people analytics among decision-makers in HR and the business that drives adoption and improved talent, customer, and operational outcomes.
(This article was written without the aid of generative artificial intelligence).
?
Analytics leader, advisor, and coach
1 年Michael Parzen, Zachery Anderson and Matthew Martin: this article relates to your discussion on explainability in your panel session on Building Transparent and Accountable AI at the Digital Data Design (D^3) Institute at Harvard Catalyst event at Harvard Business School today.
Very nicely written! Great use case and explanation, with a clear “wow” factor in terms of insight. This is one for the toolkit!
Sr. Analyst @ DaVita
1 年Brad Woodfield
People Research Scientist | Analytics & Insights Consultant | Data Storyteller
1 年Very interesting read, thanks for sharing! I was not familiar with Shapely Values in my field (IO Psychology) — though a quick google search indicates we may just refer to it as General Dominance Analysis. An alternative to these that becomes particularly useful for large datasets is relative weights analysis (RWA). Results are very similar but RWA tends to have less computational load. I love the visualization and have been iterating through better ways to visualize my RWA output. I will definitely mess around with that kind of plot for RWA. Thanks for sharing, Amit Mohindra!