登录查看更多内容

Mr. Wolf p-hacked and fooled the team and management | Learn about AB-Testing and p-value

Shaurya Uppal

Data Scientist | MS CS, Georgia Tech | AI, Python, SQL, GenAI | Inventor of Ads Personalization RecSys Patent | Makro | InMobi (Glance) | 1mg | Fi

发布日期: 2021年6月20日

After reading the HBO intern's case who triggered a test email to a lot of users. It reminded me of an intern who worked with me sometime back. Let's call him Mr. Wolf Gupta (identity hidden).

What wrong did Mr.Wolf do?

Mr. Wolf p-hacked an experiment. Fooled me, the team, and the management. ??

What was Mr. Wolf working on?

Mr. Wolf was working on a recommendation engine data science model and to prove it was an improvement, a-b tested with an existing running model version with equal user splits.

After few (n=30days) days of running the experiment, Mr. Wolf saw 12% CTR improvement (on aggregate numbers) in his new model in comparison to the old version.

Mr. Wolf Rejoiced and Celebrated!!! ?? ??

Perform Hypothesis Testing: Two-Way T-Testing

I asked Mr. Wolf to prove it is not a random occurrence of an event, please perform a two-way t-test on per day CTR distributions of both the model versions.

He found the p-value>0.05 i.e. failed to reject the null hypothesis, which implies both models are the same.

Mr. Wolf got shocked ?? seeing the results, kept this secret ??? ??.

Mr. Wolf being naive and still believing his model is better by just observing the CTR improvement. He thought there is some issue with the hypothesis testing technique.

Prayed to God ?? and performed hypothesis testing multiple times on the same distributions. p-value kept on changing but still >0.05; He kept on praying and then...

Finally, he found the p-value<0.05 ?? which means the Null hypothesis is rejected implies the models are different.

He rejoiced and celebrated took screenshots of the test results and shared with the team that his new model is the best. ????

Everyone shouted Significant Woohoo!! Celebrated the 12% CTR win on the experiment ?? ??

WAIT but Mr. Wolf p-hacked ?? the experiment ?? ( which I learned from him, after a larger phase release and analyzing the metrics ).

Never be like Mr. Wolf if someone would have asked him Power value (probability of correctly rejecting the null hypothesis), he would have gotten himself in trouble.

It is completely fine if your experiment/model failed. Failure is a stepping stone towards the best model version. Failure of an experiment also gives learning in return. ??

“Do not be embarrassed by your failures, learn from them and start again.” —Richard Branson

Shaurya Uppal

2 年

Mr. Wolf back with another mischief: https://www.dhirubhai.net/posts/shaurya-uppal_datascience-newsletter-dataleakage-activity-6902443991955296256-ZdA0

Shaurya Uppal

3 年

Book defination of p-hacking: We say experiment is p-hacked when we incorrectly exploit the statistical analysis and falsely conclude that we can reject the null hypothesis.

查看更多评论

要查看或添加评论，请登录

Shaurya Uppal的更多文章

Personalization at Lyft | Ride Hiding Experience Ultra Pro

2023年4月18日

Personalization at Lyft | Ride Hiding Experience Ultra Pro

During the weekend, I reviewed various engineering blogs and videos that highlighted the onboarding and checkout…

5 条评论
My Entry to the Data Realm - The HandShake: Part One

2023年4月5日

My Entry to the Data Realm - The HandShake: Part One

This newsletter stands out from my previous editions and has been highly requested by aspiring data scientists and…

5 条评论
Cracking the Birthday Code: The Birthday Paradox

2023年2月17日

Cracking the Birthday Code: The Birthday Paradox

The Birthday Paradox answers a question that many of us had in school: why is it so common to meet people who share our…

2 条评论
Data Scientist rescuing Mr. Wolf to build a Classifier

2023年1月12日

Data Scientist rescuing Mr. Wolf to build a Classifier

My engineering friend, let's address him as Mr. Wolf ?? (identify hidden), requested a 1:1 call to help him fix his…

2 条评论
You Won't Believe the Insights a Data Scientist Uncovers about Google Maps!

2022年12月31日

You Won't Believe the Insights a Data Scientist Uncovers about Google Maps!

Google Maps is an amazing product, and it brings a lot of value to people's lives. Today, I thought to share my views…

1 条评论
Ads Personalization like Google: AdRank to increase relevance and maximize revenue

2022年10月17日

Ads Personalization like Google: AdRank to increase relevance and maximize revenue

1Liner Stack Rank Ads based on a user’s affinity (quality score) and monetization metrics. Objective Ads…

1 条评论
Building a Real-Time Player Matching Algorithm for Chess.com

2022年9月13日

Building a Real-Time Player Matching Algorithm for Chess.com

I've recently resumed playing chess after a very long break. My time spent on wasteful and unproductive entertainment…

2 条评论
The Pinterest way to measure Ads

2022年8月16日

The Pinterest way to measure Ads

A cutting-edge data science model can only be created if impact is measured properly. Pinterest upgraded everyone's…

5 条评论
Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part Two) - My First Research Paper Project

2022年7月25日

Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part Two) - My First Research Paper Project

Disclaimer - This is a re-published edition that was originally published on Medium of 1mg Engineering (author: Shaurya…
Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part One) - My First Research Paper Project

2022年7月8日

Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part One) - My First Research Paper Project

Disclaimer - This is a re-published edition that was originally published on Medium of 1mg Engineering (author: Shaurya…

2 条评论

See all articles

Mr. Wolf p-hacked and fooled the team and management | Learn about AB-Testing and p-value

Shaurya Uppal

Data Scientist | MS CS, Georgia Tech | AI, Python, SQL, GenAI | Inventor of Ads Personalization RecSys Patent | Makro | InMobi (Glance) | 1mg | Fi

What wrong did Mr.Wolf do?

What was Mr. Wolf working on?

Perform Hypothesis Testing: Two-Way T-Testing

He rejoiced and celebrated took screenshots of the test results and shared with the team that his new model is the best. ????

WAIT but Mr. Wolf p-hacked ?? the experiment ?? ( which I learned from him, after a larger phase release and analyzing the metrics ).

Shaurya Uppal的更多文章

社区洞察

其他会员也浏览了

Logistic Regression: Basics, Obscurities and its Membership as a Classifier

Launch today on Product Hunt ??

Quantile Regression Random Forests

Logistic Regression

Run Scrapy on Apify

Data Journey With dota2 or How to Fail Miserably and Recover From It.

Assumptions of linear regression explained

P-hacking in A/B Testing Sensationalized

Linear Regression: How to find line of best fit ?

Bonus mlcourse.ai assignments

What wrong did Mr.Wolf do?

What was Mr. Wolf working on?

Perform Hypothesis Testing: Two-Way T-Testing

He rejoiced and celebrated took screenshots of the test results and shared with the team that his new model is the best. ????

WAIT but Mr. Wolf p-hacked ?? the experiment ?? ( which I learned from him, after a larger phase release and analyzing the metrics ).

Shaurya Uppal的更多文章

Personalization at Lyft | Ride Hiding Experience Ultra Pro

My Entry to the Data Realm - The HandShake: Part One

Cracking the Birthday Code: The Birthday Paradox

Data Scientist rescuing Mr. Wolf to build a Classifier

You Won't Believe the Insights a Data Scientist Uncovers about Google Maps!

Ads Personalization like Google: AdRank to increase relevance and maximize revenue

Building a Real-Time Player Matching Algorithm for Chess.com

The Pinterest way to measure Ads

Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part Two) - My First Research Paper Project

Machine Learning-Powered, Pairwise Ranking of Reviews by Relevance (Part One) - My First Research Paper Project

社区洞察

其他会员也浏览了

Logistic Regression: Basics, Obscurities and its Membership as a Classifier

Launch today on Product Hunt ??

Quantile Regression Random Forests

Logistic Regression

Run Scrapy on Apify

Data Journey With dota2 or How to Fail Miserably and Recover From It.

Assumptions of linear regression explained

P-hacking in A/B Testing Sensationalized

Linear Regression: How to find line of best fit ?

Bonus mlcourse.ai assignments