Definitely not the last word...
Acknowledgements: With kind thanks to Linda Raftree, Rick Davies and Kim Forss for their encouragement, support, feedback and critique.
This article is the final part (nine) of a series exploring the links between evaluation and technologies. It will take at look at some of the important issues not covered here, address some of the interesting questions and feedback from readers, as well as detailing some of the further resources that have been recommended.
“Ninety percent of problems have already been solved in some other field. You just have to find them.” Tony McCaffrey. 2015. New Scientist
Thanks to Rick Davis for sharing McCaffrey’s wisdom above, and to Mike Klein, Ian C. Davies and Greet Peersman for their thoughts, recommendations and resources.
This series aimed to produce something accessible to evaluators and social scientists who have not yet ventured towards data science. The world is always changing around us but what makes the advances in technology different, is that as evaluators we are not simply consumers enjoying the benefits of easier online purchases or more sophisticated mobile phones. Digital technology is concerned with data, and so are we. So we should ask ourselves where do we sit within the new technology paradigms? We could shun them [139] or we can ‘upgrade’ ourselves usefully. I suggest that we educate ourselves and get involved.
We should be neither over-optimistic that every or any new innovation is going to solve things nor should we be dismissive. For there are real advantages and real risks, and we need to be able to discern both. This is one of the reasons that ethical issues were a recurrent theme in the series. “Big data is made from people” as Michael N Karim and colleagues remind us. In correspondence, Greet Peersman highlighted the 'leave no one behind' agenda and 'do no harm' principle. Because it’s not only big data, of course, and the question of who benefits or is harmed by data and how it is used are obvious questions to ask. Greet brought to my attention the Indigenous Data Sovereignty movement, and in particular the Maiam nayri Wingara Aboriginal and Torres Strait Islander Data Sovereignty Collective. https://www.maiamnayriwingara.org
How data is processed also has paramount relevance to evaluators. We need to be able to evaluate the technology down to the algorithmic level but also working from the other direction upwards in terms of what considerations need to be made (and what skills we might need to gain). The practice of evaluation is being influenced by our tech and data colleagues so it’s about trying to engage with the ethics and philosophy and what it practically means. The concerns around the distancing effect of technology offer some really useful tools but we need to be concerned about the entire programmatic pipeline as tech is embraced.
And this is not unfamiliar to evaluators at all. Our whole approach is based on looking at what an organisation said it would do, what it actually did, and what that means. It’s not so mysterious to incorporate technology and algorithms into this process.
On predicting risk: The week that the predictive case study was published, there was further publicity about the West Midlands police force that was mentioned in the article. The to and fro between the force and the West Midlands police’s ethics committee highlighted the risks of entrenching bias.[140] The child maltreatment case study begs the question: Can an algorithm only predict what the humans would have predicted anyway? On reflection, there is the risk of the methodology of problematisation not transforming situations as intended, but rather entrenching. What would be the effect of coming at it from the opposite direction - i.e. what indicators correlate with an absence of child maltreatment? That would be possible, as would searching for positive deviance, cases where maltreatment is absent despite all context indicators suggesting high risk.
In considering frontier technologies I have looked through the broad lens of evaluation to consider some key principles of how we undertake inquiries. But there is enormous scope for taking a closer look through some of the lenses that we use within evaluation. Just as Emily Keddell's analysis of the predictive risk model identified the risk of gendering child maltreatment, we can equally apply both comprehensive gender and human rights lenses. The foundation of human rights are autonomy, liberty and dignity. From a human rights-based perspective, we must ensure that we not only avoid impinging on these foundations but what we do must strive to enhance them.
Conclusions and Some Recommendations
In part one, I encourage colleagues to engage with the philosophy, psychology and ethics of developing and deploying algorithms, as a mechanism for understanding how to evaluate their design, implementation and impact. Through the lens and focus of principles and values, we need to engage in this learning, discussion and practice, offering our own perspectives to our tech colleagues, so that we can explore effectively together.
In part two, I re-engage in one of the fundamental challenges we face as developers: How do we best help and serve, and specifically how do (any of) our technologies serve best those we seek to co-develop with? Acknowledging the perennial problem of distance, some of our frontier technologies seek to overcome the multifaceted problems, and it may well lead to dramatic results that undermine the paradigm, from which the development began. The nature of disruption is key here.
In part three, I highlight the tricky problem of what our data consists of and how we publish it. It seems obvious that some of long held 'traditions' of evaluation reporting will change/ will need to change. I tend to support actively working on those changes. Such established practices are likely to change for us and around us, whether we engage or not. I firmly support engaging.
In part four, I highlight how our assumptions can and do scupper our best efforts. The need to engage our activities with careful consideration of multifaceted workflows, technologies and problems, generates complexities that can potentially see us losing sight of our goals. I am reminding myself that frontier tech will and does suffer the same pitfalls.
In part five, I provide a range of useful resources, thinking and processes that are potentially helpful for us in engaging our thinking and practice with regard to ethics and therefore values.
In part six, I concentrate on a relatively early predictive model case study that continues to have relevance in public policy initiatives. I look at the critiques, and highlight some of the questions that I have for that (and similar) initiatives, and questioned some fundamental assumptions of the intervention.
In part seven, I attempt an overview of what I can see from my perspective. And I have tried to offer the appropriate contextualisation. Hopefully this adds another perspective that is useful in cutting through the hype and jargon of the fields.
Part eight offers some reflection on the rescuing and redeployment of RCTs. It's still early days, so we will see if the initial successes and enthusiasms build. But in any case highlighting a re-tooled approach for the evaluator's skillset, I both enthuse about the adaption and evolution, but also caution against hype. The philosophically realist perspective takes a tool for the gold standard achievement, and looks at what is practically possible and "good enough".
Suggested Further resources
Sociologie des outils de gestion (Sociology of management tools) by ève Chiapello and Patrick Gilbert. Polish up your French language skills (if you need to), to view the broad question of tools and techniques through a social analysis lens, with relevance to considerations of algorithms.
Robert Gatenby is a mathematical oncologist using game theory to analyse cancer progression and the impact of different strategies for treating it. https://cancerworld.net/cutting-edge/beating-cancer-at-its-own-game/
The Book of Why: The New Science of Cause and Effect, Prof. Judea Pearl and Dana Mackenzie. From the NY Times review: While it is “easy to understand why some people would see data mining as the finish rather than the first step,” Professor Pearl reminds us that the most important questions will always require “us, as well as future machines,” to engage in the “work of having to consider and articulate substantive assumptions about how the world operates.”
Why Greatness Cannot Be Planned: The Myth of the Objective, Kenneth O. Stanley, Joel Lehman: A surprising scientific discovery in artificial intelligence leads Stanley and Lehman to declare that the relentless measuring of progress and our obsession with the objective has gone too far, and we can embrace serendipitous discovery and playful creativity.
Thanks for reading!
You can explore this series of articles using the following links...
~ Part One ~ Computation? Evaluate it!
~ Part Two ~ Distance Still Matters
~ Part Three ~ Doing things the way it's always been done, but better (Qualified)
~ Part Four ~ Doing things the way it's always been done, but better (Quantified)
~ Part Five ~ Ethics, psychology and bias
~ Part Six ~ A Predictive Case Study
~ Part Eight ~ Nimble, Some Comments and an Uncanny Graph
About the author
Jo Kaybryn is an international development consultant, currently directing evaluation frameworks, evaluation quality assurance services and leading evaluations for UN agencies and INGOs. “All thoughts presented are my opinion only. Mentions of colleagues past and present should be taken as recommendations, as I have always gained much from working with them. No one should take investment advice or suggestions in this series: none is intended, the series aims to present reflections, and add to the conversations in and around evaluation and frontier technology.”
References
[140] Ethics committee raises alarm over 'predictive policing' tool