Data ethics manifesto

Data ethics manifesto

Advice on how to rebuild trust in data

“Ethics: moral principles that control or influence a person's behaviour” – Oxford English Dictionary

Rarely has the misuse, misappropriation and misinterpretation of data has had such an impact to our society as during the Covid crisis. From loss of life and livelihoods to the possibility of a worldwide depression, there has been a need to make calm and rational decisions. But what was unprecedented is in the reliance on data, data modelling and data analysts in supporting these decisions.

Sadly, the experience has fallen short at least in the UK. Contradictory and competing analysis was made on incomplete data sets with opaque assumptions which were shrouded in secrecy. Whilst the politicians emphasised their decisions were based on science, it was not explained what the science was and how it was validated.

The results were despite having the most advanced health system in the world, UK found itself with the second highest number of Covid deaths.

The related disciples of data science, modelling and analytics now find themselves in a similar position as medical discipline found itself after the MMR scandal where poorly made decisions, hubris, the absence of proper peer review and visibility caused the erosion of trust.

To recover, data ethics needs to be at the heart of data strategy.

What are ethics and why are they needed

Ethics are the moral principles that a person must follow, irrespective of the place or time. Behaving ethically involves doing the right thing at the right time. Business ethics focus on the moral principles that those in business must follow in their respective field of business, medical ethics focus on those principles that those in medicine must follow, and so on.

Ethics is about doing the right thing, but first you need to determine your standards of what “good” is and then applying those standards to reduce the impact of doing the wrong thing.

As ethics can be subjective it is important to make sure those with the appropriate knowledge are involved. It is also important that there is no deliberate on unconscious bias involved. In certain areas it is more important to have higher standards than other.

In 1998 Dr Andrew Wakefield claimed evidence linking MMR to autism. This baseless claim led to many parents withholding the vaccine as well as anxiety to most parents. In 2010 he had his medical licence removed and deemed he “abused his position of trust” “dishonestly and irresponsibly”. Among the allegations leveled against him were the use of unethical medical techniques as well as having a conflict of interest in the findings.

Since then there has been a greater emphasis in medicine on peer scrutiny prior to publications and transparency. Additionally, medicine has adopted higher standards of ethics.

The process that should have been followed here was nothing new and the “scientific process” has been developed over hundreds since the renaissance to ensure that conclusions arising from experimental data and observations are rigorously peer reviewed and any bias of the experimenter is removed. Whilst the scientific process is not inherently ethical, it does share characteristics of peer review and scepticism that an ethical process must follow.

So, what is this to do with data?

This is a topic I wrote in detail in “Bad Science: the end of the data scientist”, but to summarise, during the Covid crisis, the UK government put unprecedented emphasis on their approach on a small group of data scientists, led by Neil Fergusson. Whilst claiming to be “led by science”, the modelling approach used was flawed as it was based on a large assumption set, and the modelling parameters were easily open to abuse to match the results expected by the politicians. Dominic Cummings UK’s top data-scientist played a leadership role, despite having been determined by Parliament to have been involved in unethical practices with data previously.

The consequence of this was the Schr?dinger’s cat like situation where both extremely negative and extremely benign outcomes could co-exist simultaneously. This clearly ludicrous situation gave ammunition to both pro- and anti-lockdown advocates (although not to the extreme as seen in the USA)

The majority, if not all, analysis I have seen on relating to Covid, the data scientist has "owned" the entire process. I can't think of any "scientific" discipline, that does this. No matter how good the data scientist is, if they are not engaging with others then the scientific process is not being followed.

Data science as used here, was no more scientific than astrology in the middle ages. A complete lack of scrutiny and common sense led to very damaging outcome. A comment of support I received on one of my posts summed it up better than I could:

“Lack of ethics, common sense, humility and systemic view, - ostentation of random graphs and numbers (or, more in general, misuse of inconsistent and incomplete information), - fear used as a weapon (with the connivance of the media), - general incompetence, ... What can be wrong?”

Ethics within the data context

On social media I have heard many times that data tells the truth. Actually, this is not correct. Data does not have an opinion: it is just data. It neither tells the truth nor lie. It where the analysis comes in, and bias of the author is part of the process consciously or unconsciously.

And assuming the analysts are being honest brokers, in the majority of institutions I have worked even mature data sets have bad data (either missing or wrong), so we should not implicitly trust what we see. In-fact we have a moral obligation to do the opposite:

  • We should be sceptical and distrustful of the outcome (particularly if it’s one we want to see) until we are confident otherwise.

This emphasis on due process is at the heart of data ethics to ensure:

  • Data sourced in ethical way, decisions made in an ethical way and with the right people to ensure that happens, and conclusions made by any such analysis must consider the wider ethical impacts of that to the audience they are presented to (i.e. could these conclusions make people operate in an unethical manner)
  • There is a process to ensure bias (unconscious or otherwise) and assumptions (covering Machine Learning, modelling and data quality) are identified and challenged. The impact of bias and assumptions should form a very visible part of how the data is presented to decision makers
  • A risk-based principle should be adopted throughout – that is the higher level of impact should require a higher level of diligence

Therefore there is a moral obligation on organisations to set up a process to cover the above. But firstly we need to define what “morality” means in this context:

  • What morality and ethics means to the organisation and how is it quantified and measured
  • Does the process we have in place deliver the best moral outcome
  • Who are the people involved in deciding this, and are they qualitied to decide if that moral outcome is met, and authorised to reject cases that do not
  • How to we determine the impact of moral decisions. How is it quantified?

Are data scientists, quants or modellers the right people to decide on ethics?

If it is important that your organisation operates morally the short answer is no.

Those who decide should be those who have the subject matter expertise to determine data satisfies the criteria set by the ethical standards – essentially following the peer review similar to that of the scientific process. In the case of a process that gives some medical outcome, subject matter experts would be doctors or clinical researches with expertise in the field. In the case of financial risk, this would include Chief Risk Officers or their delegates (depending on the impact). In the case of environmental impact, these should include experts in the environmental field. And so on.

Seniority of experts should be based on impact – something that should be evaluated as part of the process.

But these experts don’t necessarily need to be the people who “run” the process (i.e. produce analytics). But the governance should be independent from those who run and decide to ensure “arms-length” decisions and avoidance of conflict of interest. Those who run the governance process are empowered to ensure the process is being followed and is conforming to ethics standards as set by policy. In cases of breaches, these should be escalated upwards.

Fitting data ethics into exiting functions

I’m not proposing a brand-new bureaucracy here. Within most organisations there is likely already functions to ensure some control of data, and many have the following in place:

  • Data enablement: The process to ensure accurate and timely data is available to those who need it in a format useful to them
  • Data governance: collection of processes, roles, policies and metrics to ensure effective and accurate use of information so an organisation can achieve its goals.

Notice there is not anything about ethics in the above – you can have well governed data that is producing bad ethical outcomes in a timely manner.

So data ethics is just about inserting “ethics” into these functions.

Where your organisation practices advanced levels of data enablement such as allowing more open access to data, care must be taken how those entitled to this data still operate within an ethical manner.

Do not underestimate how easy it is to misuse the power of graphs

Out of the entire data analytics process, one single artefact – the graph – has an unprecedented level of power in shaping an outcome – a decision can be made purely on the direction and shape of a line. So out of all the analysis this one needs to be done right. If one artefact could be made to become more “honest” then this is one.

I will provide two examples to emphasise this.

(i) How not to present data

Well rather than embarrass anyone I created a parody on LinkedIn to highlight this. Whilst it is jokey, I hope you can see the point I am making that unattributed data, by those with a vested interest can be made to present any outcome as is desired.

No alt text provided for this image

(ii) A more ethical way to present data

No alt text provided for this image

Whilst this graph is still conveying a message (and still making a point) it provides equal weight to what might change and allows the reader to make an informed choice on how this is to be interpreted. What is significant is that it is not in the small print somewhere but is made completely transparent and open so as not to mislead.

This was produced by the BBC. As well as being generally dependable they are legally mandated not to show bias hence make an effort in how the presenting information. This further reinforces the trust people have on them.

(iii) A final thought on graphs

If you are a decision-maker you have an ethical obligation to make an informed decision. For those of the highest impact it is not acceptable only base this on graph. You must also understand where this data is from, how it produced the graph (to a level you are satisfied due process was followed), and anything that could alter the outcome.

Some Principles of data ethics

This final section summarises some items that could go into your data ethics policy. It is not exhaustive, and it is up to you how much, or how little you use.

(i) Scope of data ethics

Data taken outside of a wider context is meaningless, therefore the scope of any data ethics policy must be broad enough to cover:

  • Data sourcing 
  • Data transformation and business rules (including data quality)
  • Analytics and visual analytics (such as graphs)
  • Reporting
  • Model and scenarios, and any mathematical modelling and algorithms relating to them 
  • Machine learning and AI relating to this process
  • Assumption, shortfalls and caveats

(ii) Ethically sensitive data domains

Areas where ethics is particularly needed (again not an exhaustive list):

  • Anything relating to personal identity (Gender; Race; Religion etc)
  • Anything relating to medical matters (Medical advice, Medical data held about an individual
  • Social policy
  • Physical wellbeing (such as lifestyle advice) or financial wellbeing (such as financial advice)
  • Political science (Political affiliations, Socio economic bandings, Political profiling)

(iii) Checklist for your data ethics policy

It is up to the organisation to determine the level of rigour of data ethics, and again this depends largely on whether the data is ethics sensitive. Once a list has been agreed then this should inform (and be included in) any ethical data policy you create.

  1. Has all data been sources from ethical places? Can you identify whether data assets, external sources of data and other data feeding decisions that are inherently sensitive?
  2. Have all variables and assumptions been tested, and have all outcomes been clearly presented to decision makers and consumers of finding (including members of public if presented openly)?
  3. Are those making decisions qualified in the domain impacted by decisions (Medics, Environment scientists, Risk Experts etc)?
  4. In advance of study was there a desired outcome and are any variables (even inadvertently) susceptible to producing such an outcome
  5. Are those making decision able to independently verify the way data was sourced and model, and if not, are the consequences understood?
  6. Are the conclusions being made ethical manner? And is impact of conclusion on broader society also ethical?
  7. Is the ultimate decision maker independent from those putting together data and model 
  8. Are all weaknesses of data transparent?
  9. Can data scientist explain their findings in detail (including assumptions) to those making a decision and are those making a decision able to understand them. Or are those making the decision just looking at another graph 
  10. Can conflicts of interests be identified regarding any conclusions of analysis, and if so how are these measured against the individuals participating in the analysis?
  11. Are there any conclusion that those conducting the study would like to see in advance of study and can assumptions be altered to deliver the outcome sought?
? Deryck Brailsford May 2020

About the Author:

Deryck Brailsford is a leading world authority in the new discipline of data ethics. He has over 25 years in the field of data strategy and change leadership and has consulted for many organisations including KPMG and Morgan Stanley.

If you would like a free initial audit to check that you are using data correctly to make ethical decisions then please contact him on +44 7710 435227 or [email protected]

Knut Haakon Flottorp

Director (CEO), Earth Cell Norway AS past: Norsk Data a/s

4 年

Please redraw the picture, in data, there is no god or central institution that draws everyone to it. That is the users and society pull into a socialistic state, of central power and corporate governance. In technology, there is no god, there should be many. That we have so few is just because of socialistic tendencies that are about. A "manifesto" should refuse all central governance, and dispute all attempts to control the individuals.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了