It’s About the Decisions, Not the Data
Yesterday I had a great talk with Gartner analyst Nate Novosel and two esteemed colleagues at the 荷兰鹿特丹大学 on data governance. And on one of Nate's slides it said: "It’s About the Decisions, Not the Data". And I thought, I really have to share some more personal thoughts on this.
It is much like horse riding (equestrianism). Beginners usually blame the horse. But in most cases when things do not go as expected; it is the person - not the horse. Same with 'scientific integrity', this is code for 'integrity' in a professional context, but is serves as a euphemism; we focus on the 'science' part instead of the personal ethics and moral part.
And then there is this framing of 'data driven'. We work 'data driven'. And 'the data tells us'. Well, in the same way: it is not the data it is about the questions to which the data provide us context, and the decisions made, based on this.
For many reasons this is an important point, but allow me to provide two reasons.
Dataism
Focus on mere data, following the 'data driven' frame, results in the idea that the world can be reduced to data and behaviour patterns that can be distilled in algorithms. We can then use these data and algorithms to make fast decisions for us on a large scale.
But what we see is that a human is more than the data we can collect about this person and the way people behave can in some ways be predicted, nudged and influenced, but is far more complex, and less pre-determined than we generally understand.
With dataism, decisions are made about people, sometimes without people knowing this, based on 'what the computer says'. One of the most striking examples I heard about this comes from Lokke Moerel when she referred to this 2017 article from The Guardian: New AI can guess whether you're gay or straight from a photograph. An algorithm deduced the sexuality of people on a dating site with up to 91% accuracy, raising tricky ethical questions, based on a Stanford University study.
A modern day variation on biological determinism as coined by Cesare Lombroso (born Ezechia Marco Lombroso); the idea that a person's predisposition to mental illness was determinable through her or his appearance.
In the 'Stanford' article: 'Deep neural networks are more accurate than humans at detecting sexual orientation from facial images' it is concluded (pg 251) that:
gay men had narrower jaws and longer noses, while lesbians had larger jaws
What most people will not read are the author's notes in which they state (page 1):
Our work is limited in many ways: We only looked at white people who self-reported to be gay or straight.
To conclude: all these relevant nuances disappear behind the black box of the computer, when the computer merely says "NO". For instance if a person should be allowed to enter a country in which being gay is punishable by law.
Two problems here, aside from the problematic attitude towards gay people: one does not get the chance to challenge an assessment that is made about you, based on data collected about you, without your knowledge, and some algorithm. And the algorithm can be wrong for many reasons. If one does not fall in the category 'white people', than the quality of the outcome of the algorithm is unclear.
To conclude: the data tell us not so much by itself. Perhaps the problem of dataism in the context of governments, can be captured in this false reasoning (my framing):
My rhetorical question would be how citizens can trust a mistrusting government, that in the end will pass unjust and unfair judgements on their citizens, provided that the 'data and algorithm' approach will not do justice to the reality that is not so easily captured in data and algorithms.
For another context of the same point: see how decision making in healthcare is still predominantly defined by the 'white male standard', and how this effects the diagnosis and healthcare of women in this 2019 article in The Guardian: The female problem: how male bias in medical trials ruined women's health. The data tell us at best something reliable about a specific section (white males) of reality.
The Question preceding the Data
I notice that for most problem solving people look at data. We have all these sensors, camera's, data; what can we do with it? But I beg the question: do we understand the problem well enough to know which data will actually help us solving the problem?
Is this data available? Is the quality of this data guaranteed? How? Is the context in which the data was collected still relevant for our problem? Do we need to collect new data? Who should we ask to get that data? Can we buy the data somewhere and can we trust that data? Are we talking about data related to attitudes and behaviour, are we talking about data collected during interaction with systems of people (tele-metrics), are we talking about auto generated metadata, are we talking about enriched data by experts and/or specialists (form instance diagnosis). What kinds of data are we talking about?
But more importantly, again: do we understand the problem well enough to know which data will actually help us solving the problem? Is there a bias in our understanding of the problem. Is there a framing of the issue itself? Are we in a hierarchical position which makes it hard to understand the problem from different stakeholder's points of view? Who benefits from solving the problem? Is it the citizens or the city (as in the cases of 'smart cities', where we now rather speak of 'Citizen Empowerment in the Smart City').
Is the question we want to solve a fair and just question? If we solve the question and start investing in collecting relevant and qualitative data, are we willing to really change something, based on the new knowledge we acquire? So, is there indeed a political readiness to actually change something, for instance when addressing the big societal challenges and/or the Sustainable Development Goals?(SDGs). Or are we just going through the motions, because everyone else is?
To me, the question preceding the data related stuff is: how do you see your role, and which responsibility do you take, and should you take, given your role, to add value. To a business, or to society for instance. So how do view your responsibility, and accountability and do justice to these, with and ethical sense of the context in which you can make a difference, and how do you do this responsibly, and in a sustainable way. This is the kind of innovation that I would love to contribute to, from my role. In which data protection, for instance, is an enabler, and not a hindrance. But we need to have a talk about the data first, instead of considering data as the starting point of everything. Because: It’s About the Decisions, Not the Data.
The data is not driving us, we are at the driver's seat and we can use an accountability wheel to drive the business we are committed to.
See for instance the Accountability Wheel by Centre for Information Policy Leadership (CIPL) in this report to which I am proud to have contributed.
Epilogue
Just to be sure, I am an advocate for innovation and responsible use of data. I see many examples where good use of high quality data and algorithms is beneficial, for instance in well defined conceptual domain, where context and meaning are well defined. For instance the use of (big) data in health, which improves diagnosis, treatment, patient's experience of healthcare, logistic processes, to name a few.
But the 'data dimension' is to me one of the dimensions to be considered. I adhere to the so called 'Five Safes Framework':
Safe data: data is treated to protect any confidentiality concerns.
Safe projects: research projects are approved by data owners for the public good.
Safe people: researchers are trained and authorised to use data safely.
Safe settings:?a SecureLab environment prevents unauthorised use.
Safe outputs: screened and approved outputs that are non-disclosive.
And I specifically work with the Privacy Analytics adaptation of the framework, focused on Risk Based Anonymization of data.
I have been playing a bit with the idea of using the Fides Language; a proposed model for a human-readable "taxonomy" of privacy-related data types, behaviors, and usages. To solve the context and meaning puzzle, and to be able to scale up and automate data protection decisions, following the GDPR (or the HIPAA) logic.
If anyone is willing to share practical experiences of application of the Fides Language, please connect with me and share your thoughts!
Management consultant at CGI
1 年Uitstekend verhaal Marlon. Niet ongewoon is om de verantwoordelijkheid voor een besluit te leggen bij de automaat.
Solutions architect, security consultant, privacy officer and DPO.
1 年"If you're not (fully and thoroughly) prepared for the answer, don't ask the question."
Updated the article with an Epilogue
Secretary Office Legal Affairs at Erasmus Universiteit Rotterdam
1 年Wat een mooie foto!