Analyze real data, data scientists!

Analyze real data, data scientists!

Data is always messy and its analysis requires smart choices. That’s why AI will never replace human market researchers. In both academia and industry, smart consumers of market research therefore demand transparency in the exact steps researchers took with the data – and the peer review process in top journals does a decent, though imperfect, job in catching those that would change the actionable insights - a Nature article claimed it would have prevented the Theranos fraud.

No alt text provided for this image

?

What is definitely NOT a good way of dealing with data messiness is to just make up data. In academia, we have seen our share of scandals. In his memoir “Faking Science: A True Story of Academic Fraud”, former psychology professor Diederik Stapel explains how he got annoyed with data not supporting his brilliant ideas at the 95% statistical confidence level, and fabricating them to fit said ideas. In his words, he “became impatient, overambitious, reckless. I wanted to go faster and better and higher and smarter, all the time.” Fortunately, academia has gotten much better in catching and reporting fraud.

No alt text provided for this image

How about faking data in industry? This week’s lawsuit from JPMorgan Chase against Frank is likely just the tip of the iceberg. According to the filing, Frank’s founder and CEO Charlie Javice lied “about Frank’s success, Frank’s size, and the depth of Frank’s market penetration in order to induce JPMC to purchase Frank for $175 million”. Specifically, Javice used “synthetic data” techniques to create a list of 4.265 million “students” who did not actually exist. When a Frank engineer declined to do so, Javice “turned to a datascience professor at a New York City area college who advertised his “creative solutions” to data problems.” Based a list of 293,192 actual students who had started or submitted a FAFSA application through Frank, Javice directed the Data Science Professor to use?“synthetic data” techniques to create 4.265 million customer names, email addresses, birthdays, and other personal information. Interestingly, the lawsuit has access to emails between Javice and the Data Science Professor showing their understanding of their actions. The Data Science professor wrote

(1) “[f]or names, our plan was to sample first name and last name independently and then ensure none of the sampled names are real” , and

(2) “I can’t seem to find addresses in my raw files . . . . Should I attempt to fabricate them?”

Moreover, when reviewing the synthetic data, the Data Science Professor noted that many entries confusingly had customers living, attending high school, and attending college in the same town and state, and concluded that the list “would look fishy to [him] if [he] were to audit it.”


No alt text provided for this image

?Beyond such million dollar monetary consequences, faking data can cost more, including failed harvests and famine if you eg “changed centrally held figures for a key metric such as soil fertility that many arable farmers use to organize their planting schedules”. It can also costs lives in medical trials, such as when dr. Werner Beswoda faked results that high-dose chemotherapy was successful in the treatment of women with high-risk breast cancer. Other researchers build on such research, and can waste years, sometimes setting back the research by decades. They also give an excuse to people not to take actual science seriously. For instance, a January 11th study debunks the misconception that COVID trials cut clinical corners.


The bottom line? Stay vigilant as a consumer of scientific studies and market research: ask the tough questions and demand answer. Stay patient as a researcher: it is the dynamic dialogue between theory and empirics that drives science forward.

Raphael Fitoussi

Chargé de communication ?? | Gestionnaire de communauté ?? | Graphiste ?? ?? Disponible pour CDI & Missions Freelance

3 个月

Hey Koen! It's crucial to shed light on the dark side of data science and the industry issues you're addressing really hit home. How do you reckon we can ensure more integrity and transparency in both academic and industry sectors? The messy nature of data definitely requires sharp minds to tame it, and your insights highlight the human touch that AI can't replace. Looking forward to more of your insights into the evolving landscape and ways we can push boundaries while avoiding the traps of fraud!

回复
Byron Sharp

Research Professor (Marketing Science), Director Ehrenberg-Bass Institute, Adelaide University of South Australia.

2 年

Never trust a single study, especially a single set of data. Use it as an interesting starting point.

Luc Wathieu

Professor @ Georgetown | Behavioral Economics | Consumer Empowerment | Product Management | Customer Analytics

2 年

Nice overview!

Dr. Augustine Fou

FouAnalytics - "see Fou yourself" with better analytics

2 年

interesting details. thank you for posting.

要查看或添加评论,请登录

Prof. dr. Koen Pauwels的更多文章

  • E pluribus unum or Caesar decernit? Marketing Liberal Democracy

    E pluribus unum or Caesar decernit? Marketing Liberal Democracy

    It’s been exactly a decade now since I blogged about ‘Marketing Democracy: 5 principles for E Pluribus Unum’. I was…

    4 条评论
  • 4 business insights in 24 hours at D'Amore-McKim

    4 business insights in 24 hours at D'Amore-McKim

    Why go to a far-away conference when you get the best presentations in your own hood? Today I am reflecting on our busy…

    13 条评论
  • Feeling served or exploited by AI?

    Feeling served or exploited by AI?

    AI now drives personal consumption patterns, citizen-government interaction, political (dis)information, and healthcare…

    6 条评论
  • Does price gouging regulation work?

    Does price gouging regulation work?

    Prof Fred Feinberg rocked his Marketing Science Institute webinar this Tuesday with three studies on the effects of…

    9 条评论
  • Retail media and dynamic capabilities: research @dmsb

    Retail media and dynamic capabilities: research @dmsb

    Last D’Amore-McKim's Research series, hosted the wonderful Kinshuk Jerath, Arthur F. Burns Chair of Free and…

    3 条评论
  • Does AI see you? How to use AI as a search and growth channel

    Does AI see you? How to use AI as a search and growth channel

    This weekend, below post caught my eye: Nicolas Finet, Founder of Sortlist “accidentally unlocked a new growth…

    11 条评论
  • 7 best Superbowl ads for differentertainment

    7 best Superbowl ads for differentertainment

    Superbowl today, and you know what that means! Dozens of guys in colorful compression shorts and shoulder pads will run…

    11 条评论
  • Is AI steroids or Ozyempic? It's How You Use It

    Is AI steroids or Ozyempic? It's How You Use It

    Describing AI as the ultimate research assistant makes many feel it’s corporate steroids or hormone therapy: it will…

    6 条评论
  • Mythbusting for online marketplaces

    Mythbusting for online marketplaces

    While I covered how the FTC convincingly argued that online marketplaces are waaaay better than brand websites, do…

    18 条评论
  • How to think about retail media: 4 stakeholders

    How to think about retail media: 4 stakeholders

    Retail media is a convergence of advertising, e-commerce, and data-driven insights. At its core, retail media leverages…

    8 条评论

社区洞察

其他会员也浏览了