#100things - Learn how data happens in your area
Map of cholera cases in Soho, London, 1854. Source: Wikimedia Commons.

#100things - Learn how data happens in your area

Recently, I was talking to a friend working as a statistician in manufacturing. He explained to me how his direct involvement in the manufacturing process helped him to understand where biases and limitations of the data may originate from. For example, he told a story about running an experiment, including a high volume device. The design of the experiment was especially challenging as for every run, this device would need to be filled completely with very costly ingredients. Thus, the sample size of the experiment was limited heavily by the cost constraints. It ended up being one of his most expensive experiments within manufacturing. 

I had similar experiences in my career as a statistician, when I was learning about the burden of certain data entry forms. At the time, we were changing from paper forms (yes, people were entering all the data from paper into the clinical data bases) to electronic forms capturing the data directly at the site of the investigator. The tedious process and the slow responses of the system at the time helped me to rethink how much and in which way we should collect data in order to make it as easy as possible for the investigator. That way we hoped to decrease the amount of missing data.

Even John Snow - being a data scientist over 150 years ago - investigated how the data was collected and what would be factors affecting the data collection. In his famous map showing the deaths because of cholera in a part of London in 1854 you can see a building close to the water pump likely being the culprit of the spread of the disease. Why was nobody dying in this building? He found out about the brewery within the building and its own water supply (and of course beer supply). This saved the people living there from taking the water from the infected water pump.

Become a data detective as part of your job as a data scientist or statistician as well. Learn about how your data happens. 

  • Under what circumstances do people enter the data? 
  • What could be factors affecting the data collection and cleaning process? 
  • What incentives do people have to collect the data?

Reach out to your business partners close to the data collection site. Where is the brewery in your data?

Reference: John Snow: A Legacy of Disease Detectives

https://blogs.cdc.gov/publichealthmatters/2017/03/a-legacy-of-disease-detectives/

Patrick Lim

Helping biotech and pharmaceutical companies deliver high quality clinical trials outputs and quantitative researcher with over 20 years experience.

3 年

Yes this is the difference between a data scientist/computer programmer and statisticians. We, as statisticians are taught about populations, experimental design etc. The others are taught to use machine learning in the collected data. 100 million records is useless if it still misses the top 10 or 20% of your target demographics. Bias 101.

要查看或添加评论,请登录

Dr. Alexander Schacht的更多文章

  • Do you suffer from BAD strategy?

    Do you suffer from BAD strategy?

    Do you know the book "Good strategy/Bad strategy"? It is an awesome book about strategy. However, it is also very…

  • Building Influence as a Statistician: The Key to Driving Change

    Building Influence as a Statistician: The Key to Driving Change

    ?? Why Influence Matters More Than Ever As statisticians, we bring analytical expertise to clinical trial teams…

    3 条评论
  • Mastering Self-Management – The Key to Sustainable Success

    Mastering Self-Management – The Key to Sustainable Success

    Why Self-Management Matters More Than Ever In the fast-paced world of statistics and data science, we often focus on…

    6 条评论
  • How AI is Transforming Biostatistics

    How AI is Transforming Biostatistics

    Artificial intelligence is revolutionizing biostatistics, but can we trust its accuracy? In our latest episode of The…

    5 条评论
  • Are You Busy or Truly Productive?

    Are You Busy or Truly Productive?

    ?? The Productivity Trap: Why Busyness Doesn’t Mean Progress How often do I find myself drowning in meetings, emails…

    2 条评论
  • Struggling with Endless To-Do Lists?

    Struggling with Endless To-Do Lists?

    Do you ever feel like your to-do list keeps growing no matter how much you check off? I’ve been there—busy all day but…

  • What Are GAMS and How Can You Leverage Them?

    What Are GAMS and How Can You Leverage Them?

    Are you tackling complex optimization problems and looking for a better way to solve them? I asked myself the same…

  • Unlock Your Negotiation Potential: Tips for Statisticians ??

    Unlock Your Negotiation Potential: Tips for Statisticians ??

    You negotiate every day—whether you realize it or not. From pushing for better study designs to securing resources or…

  • Behind the scenes of my upcoming book and more

    Behind the scenes of my upcoming book and more

    My book "How to be an effective statistician" appears tomorrow on Amazon?? If you're #statistician, #datascientist…

  • 3 powerful quotes and their background

    3 powerful quotes and their background

    "People don't care how much you know until they know how much you care." – Theodore Roosevelt This powerful quote…

    3 条评论

社区洞察

其他会员也浏览了