Data Science Bias; Lying Computers

I read this absolutely brilliant thriller by Terry Hayes called I am Pilgrim where one of the characters in the book is reading an analytics report and comments "Computers don't lie, but liars can compute". Before I put my thoughts down with respect to that comment, do read the book, it is fantastic and Terry Hayes knows how to tell a story. He wrote the screenplay for two of my favorite films - Mad Max 2 and Dead Calm, and both were thrillers which were way above the norm.

I am a data scientist, no correct that, I am a statistician and am starting to come to terms with what Data Science is as compared to Statistical Science. Once I have a better handle on that, I will write about it, but today, I wanted to think about the comment - "Computers don't lie, but liars can compute". I want to adapt this comment a bit and move the focus from "a lie" to "bias". My adapted comment would be :-

"Computers are not biased, but biased people can compute"

The above statement has to be thought of seriously in the context of data science. In my current understanding of data science, additional requirements to be a good data scientist, besides being a good statistician are :- (1) ability to think and act seamlessly between business problems, analytical problems, analytics solution and business solutions; (2) developing and implementing smart algorithms and programming; (3) data assimilation through varied data sources; (4) data visualization and story telling. Based on this, you can see that a data scientist is exposed to and is at risk to multiple sources of bias. As a statistician, you are formally taught about biases and how to avoid and minimize them, but a data scientist has to worry about other sources of bias, not only statistical bias. This risk is important to acknowledge and that is why, you can say that computers are unbiased, but biased people who can compute (including yourself) are problematic. Below are some sources of bias that should be taken care of so that the effect of "biased people computing" could be minimized ....

Bias due to ...

  1. Use of wrong statistical technique
  2. Utilizing not fit for purpose programming environment or not having access to best tool for the purpose
  3. Answering the "wrong version" of business problem and implementing a "sub-optimal business solution" despite solving the associated analytical problem correctly
  4. Assimilating data which was collected for a purpose other than the business problem at hand; assimilating too much data; leaving out data sources due to constraints which could be business related or due to personal bias
  5. Showing selective data which makes the underlying story either too clear or obtuse; Telling the associated story without the transparency needed to make it sound unbiased

As you can see the above buckets are very broad buckets of potential biases. Adding to that, the complications related to the fact that the bias can come from you, can come from someone higher up in the organization, can come due to various constraints, can happen due to technical knowledge gap or can be present due to outright lying by people involved, it is important to think carefully about "biased people can compute".

A simple way to do that, at least as a thought process, is to act like a sceptic of the highest order and question everything, but do it without becoming a cynic, a doubter.

Happy Unbiasing as it will make data science all that more powerful!

Fanny Flores

Master’s in Business Administration | Project Data Manager | Global Business Solutions at Novartis

5 年

Magnificent!

回复
Prasanthi Sanjeevi

Associate Director Compliance and Training at Novotech Clinical Research India Private Limited

5 年

Wonderful Sir, I have?noticed each bias while learning/ applying to RWD/ RWE. You have it precise.

回复
Pillai Goonaseelan (Colin)

Founder at CP+ Associates GmbH (Switzerland) and CEO at Pharmacometrics Africa NPC (South Africa)

5 年

Thoughtful post : “Computers don't lie, but liars can compute”.

Karthinathan T.

Founder and CEO of MeDaStats LLC | Statistician

5 年

Fantastic post, Ashwini!

要查看或添加评论,请登录

Ashwini Mathur的更多文章

  • Openness, Creativity, Longevity, & Best Music 2023

    Openness, Creativity, Longevity, & Best Music 2023

    Openness is a trait which has been linked with longevity. Openness is associated with better response to stress and a…

  • ChatGPT experiment

    ChatGPT experiment

    My experiment with #ChatGPT. I took a passage from #SalmanRushdie.

    6 条评论
  • Heuristics and/or Rationality

    Heuristics and/or Rationality

    Is it time for a heuristics based or a gut feeling solution rather than a fully rational scientific solution for the…

    1 条评论
  • World Statistics Day - Celebrated every 5 years

    World Statistics Day - Celebrated every 5 years

    Some random ramblings to celebrate World Statistics Day today ..

  • Some famous historical (mis)quotes about Data

    Some famous historical (mis)quotes about Data

    Recent past indicates that Data and Analytics is in vogue. When I researched this a bit on Google, I realized that this…

    5 条评论
  • The 7 habits of highly NON-EFFECTIVE people

    The 7 habits of highly NON-EFFECTIVE people

    Observing effective people and finding out what behaviors drive their effectiveness is a difficult task, as there are…

    6 条评论
  • Data Story Telling .....

    Data Story Telling .....

    There is some preliminary research coming out which is suggesting that people with good storytelling skills also…

    3 条评论
  • IPL Fever ... Some stats gyan

    IPL Fever ... Some stats gyan

    How would you interpret the batting average of 99.6 of Bradman and 53.

    4 条评论
  • The "I" Game ....

    The "I" Game ....

    The "I" game - no it is not the EGO game. That game, we are very good at without ever being trained on it.

    2 条评论
  • Data Scientific New Year

    Data Scientific New Year

    Wishing all a data scientific new year which will automatically make it a very Happy New Year because you will address…

    1 条评论

社区洞察

其他会员也浏览了