登录查看更多内容

Data Science Bias; Lying Computers

Ashwini Mathur

Executive Director, Onesto Consulting

发布日期: 2019年1月27日

I read this absolutely brilliant thriller by Terry Hayes called I am Pilgrim where one of the characters in the book is reading an analytics report and comments "Computers don't lie, but liars can compute". Before I put my thoughts down with respect to that comment, do read the book, it is fantastic and Terry Hayes knows how to tell a story. He wrote the screenplay for two of my favorite films - Mad Max 2 and Dead Calm, and both were thrillers which were way above the norm.

I am a data scientist, no correct that, I am a statistician and am starting to come to terms with what Data Science is as compared to Statistical Science. Once I have a better handle on that, I will write about it, but today, I wanted to think about the comment - "Computers don't lie, but liars can compute". I want to adapt this comment a bit and move the focus from "a lie" to "bias". My adapted comment would be :-

"Computers are not biased, but biased people can compute"

The above statement has to be thought of seriously in the context of data science. In my current understanding of data science, additional requirements to be a good data scientist, besides being a good statistician are :- (1) ability to think and act seamlessly between business problems, analytical problems, analytics solution and business solutions; (2) developing and implementing smart algorithms and programming; (3) data assimilation through varied data sources; (4) data visualization and story telling. Based on this, you can see that a data scientist is exposed to and is at risk to multiple sources of bias. As a statistician, you are formally taught about biases and how to avoid and minimize them, but a data scientist has to worry about other sources of bias, not only statistical bias. This risk is important to acknowledge and that is why, you can say that computers are unbiased, but biased people who can compute (including yourself) are problematic. Below are some sources of bias that should be taken care of so that the effect of "biased people computing" could be minimized ....

Bias due to ...

Use of wrong statistical technique
Utilizing not fit for purpose programming environment or not having access to best tool for the purpose
Answering the "wrong version" of business problem and implementing a "sub-optimal business solution" despite solving the associated analytical problem correctly
Assimilating data which was collected for a purpose other than the business problem at hand; assimilating too much data; leaving out data sources due to constraints which could be business related or due to personal bias
Showing selective data which makes the underlying story either too clear or obtuse; Telling the associated story without the transparency needed to make it sound unbiased

As you can see the above buckets are very broad buckets of potential biases. Adding to that, the complications related to the fact that the bias can come from you, can come from someone higher up in the organization, can come due to various constraints, can happen due to technical knowledge gap or can be present due to outright lying by people involved, it is important to think carefully about "biased people can compute".

A simple way to do that, at least as a thought process, is to act like a sceptic of the highest order and question everything, but do it without becoming a cynic, a doubter.

Happy Unbiasing as it will make data science all that more powerful!

Fanny Flores

Master’s in Business Administration | Project Data Manager | Global Business Solutions at Novartis

5 年

Magnificent!

Prasanthi Sanjeevi

Associate Director Compliance and Training at Novotech Clinical Research India Private Limited

5 年

Wonderful Sir, I have?noticed each bias while learning/ applying to RWD/ RWE. You have it precise.

Pillai Goonaseelan (Colin)

Founder at CP+ Associates GmbH (Switzerland) and CEO at Pharmacometrics Africa NPC (South Africa)

5 年

Thoughtful post : “Computers don't lie, but liars can compute”.

3 次回应

Karthinathan T.

Founder and CEO of MeDaStats LLC | Statistician

5 年

Fantastic post, Ashwini!

2 次回应

查看更多评论

要查看或添加评论，请登录

Ashwini Mathur的更多文章

Openness, Creativity, Longevity, & Best Music 2023

2023年12月26日

Openness, Creativity, Longevity, & Best Music 2023

Openness is a trait which has been linked with longevity. Openness is associated with better response to stress and a…
ChatGPT experiment

2023年4月13日

ChatGPT experiment

My experiment with #ChatGPT. I took a passage from #SalmanRushdie.

6 条评论
Heuristics and/or Rationality

2020年4月13日

Heuristics and/or Rationality

Is it time for a heuristics based or a gut feeling solution rather than a fully rational scientific solution for the…

1 条评论
World Statistics Day - Celebrated every 5 years

2019年10月20日

World Statistics Day - Celebrated every 5 years

Some random ramblings to celebrate World Statistics Day today ..
Some famous historical (mis)quotes about Data

2019年4月6日

Some famous historical (mis)quotes about Data

Recent past indicates that Data and Analytics is in vogue. When I researched this a bit on Google, I realized that this…

5 条评论
The 7 habits of highly NON-EFFECTIVE people

2018年10月31日

The 7 habits of highly NON-EFFECTIVE people

Observing effective people and finding out what behaviors drive their effectiveness is a difficult task, as there are…

6 条评论
Data Story Telling .....

2018年5月26日

Data Story Telling .....

There is some preliminary research coming out which is suggesting that people with good storytelling skills also…

3 条评论
IPL Fever ... Some stats gyan

2018年5月16日

IPL Fever ... Some stats gyan

How would you interpret the batting average of 99.6 of Bradman and 53.

4 条评论
The "I" Game ....

2018年5月3日

The "I" Game ....

The "I" game - no it is not the EGO game. That game, we are very good at without ever being trained on it.

2 条评论
Data Scientific New Year

2017年12月31日

Data Scientific New Year

Wishing all a data scientific new year which will automatically make it a very Happy New Year because you will address…

1 条评论

See all articles

Data Science Bias; Lying Computers

Ashwini Mathur

Executive Director, Onesto Consulting

Ashwini Mathur的更多文章

社区洞察

其他会员也浏览了

Dark Secrets of Data Science Which You Should Know

Data Science: The Sexiest Job in the 21st Century

Probability Refining the Understanding of Probability: Its Foundations and Applications

The Industrialisation and Professionalisation of Data Science: 12 Questions

The DNA of the Modern Data Scientist

How Companies Can Prepare Themselves for Data Science Adoption

What are the most in-demand skills in data science?

Why Data Science is future job?

What is Data Science — A guide to the beginners

What You Need to Know About Data Science?

Ashwini Mathur的更多文章

Openness, Creativity, Longevity, & Best Music 2023

ChatGPT experiment

Heuristics and/or Rationality

World Statistics Day - Celebrated every 5 years

Some famous historical (mis)quotes about Data

The 7 habits of highly NON-EFFECTIVE people

Data Story Telling .....

IPL Fever ... Some stats gyan

The "I" Game ....

Data Scientific New Year

社区洞察

其他会员也浏览了

Dark Secrets of Data Science Which You Should Know

Data Science: The Sexiest Job in the 21st Century

Probability Refining the Understanding of Probability: Its Foundations and Applications

The Industrialisation and Professionalisation of Data Science: 12 Questions

The DNA of the Modern Data Scientist

How Companies Can Prepare Themselves for Data Science Adoption

What are the most in-demand skills in data science?

Why Data Science is future job?

What is Data Science — A guide to the beginners

What You Need to Know About Data Science?