Null Hypothesis #4: lockdown eyeballs
Giancarlo Vercellino
Market Research | Competitive Intelligence | Business Strategy | Innovation Scouting | Data Science
Hypothesis
"Lockdowns had a positive impact on Netflix's subscriber base"
Well, it may seem quite a trivial question: worldwide lockdowns forced people at home and the need to escape ennui and domestic boredom pushed a lot of people into the arms of digital video platforms like Netflix. Easy, plausible, but still to be demonstrated. Let's see some numbers.
Dataset
Number of Netflix paid subscribers worldwide from 4th quarter 2012 to 4th quarter 2022 (source: Business of Apps, https://www.businessofapps.com/data/netflix-statistics). Here's the small dataset we are using for this little exercise: beside the cumulative number of subscribers (in millions) for each quarter, we have calculated the quarter over quarter growth and added a couple of flags (pandemic flag: before and after the formal declaration by WHO on March 2020; year_flag for discriminating the first year in the pandemic from all the others).
Analysis
How they say, a picture is worth a thousand words, and below we have 5 plots (and some R code snippets for reproducibility, that you can skip):
t.test(netflix$growth[netflix$pandemic == "BEFORE DECLARATION"], netflix$growth[netflix$pandemic == "AFTER DECLARATION"])
Welch Two Sample t-tes
data:? netflix$growth[netflix$pandemic == "BEFORE DECLARATION"] and netflix$growth[netflix$pandemic == "AFTER DECLARATION"]
t = -1.431, df = 12.102, p-value = 0.1777
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
?-5.214503? 1.077897
sample estimates:
mean of x mean of y?
?4.335333? 6.403636?
... but maybe there 's a catch here. If we compare the average quarter growth during the first year in the pandemic with all other years, well, we see some quite remarkable changes, with the boxplot for 2020 jumping well above in a significant way:
t.test(netflix$growth[netflix$year_flag == "FIRST YEAR IN PANDEMIC"], netflix$growth[netflix$year_flag == "OTHER YEARS"])
Welch Two Sample t-test
data:? netflix$growth[netflix$year_flag == "FIRST YEAR IN PANDEMIC"] and netflix$growth[netflix$year_flag == "OTHER YEARS"]
t = 3.0823, df = 3.2947, p-value = 0.04773
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
? 0.1077993 11.9866602
sample estimates:
mean of x mean of y?
?10.34750? ?4.30027?
I know, few data points and a p-value of 0.047 may not be the most rocky ground for supporting any strong hypothesis, so we need to use a little statistical hack (aka, simulation), to better understand if the growth from 2020 Q2 to 2021 Q1 could be considered really a significant leap or not, and that's why we have the next plot.
领英推荐
sampling_distribution <- replicate(100000, sum(sample(netflix$growth, 4, replace = T)))
empirical <- edfun::edfun(sampling_distribution)
empirical$qfun(0.95)
30.54
empirical$qfun(0.99)
35.7
empirical$qfun(0.999)
40.46
empirical$pfun(41.39)
0.99963
Conclusion
... how many Netflix's new subscribers could be attributed to the exceptional conditions lived during lockdowns in 2020? The answer depends on how many "asterisks" you need to ground your statistical confidence:
* : the surplus amounts to an exceeding 10.85 millions (and you have a Type I error of 5%)
**: the surplus amounts to a "more considerated" 5.69 millions (with a Type I risk of 1%)
***: the surplus amounts to a modest 0.93 millions (risk of 0.1%)
Sure, you can decide that 0.1% is unacceptable and you can dig deeper into the 99.999% percentile of the sampling distribution and you'll get even more conservative estimates, but I guess you got the point. The question here is not of how many 9s you want, but our starting hypothesis: a positive impact of the pandemic on the subscription base of Netflix is reasonable if we look at the numbers (and we have a probability of 99.99963% of a positive delta during first year in the pandemic). According to Business of Apps, depending on the specific region, Netflix's ARPU ranges from a minimum of about $7 (Latin America) to a maximum of about $15 (US & Canada), so you can have an general idea of the money we are talking about.
Last but not least: you remember that Netflix recently showed in the news for the first time loss of subscribers in its history. Well, besides the many possible causes (competition, increasing prices, etc.), IMHO that could be considered a correction of what happened in 2020 (and maybe only the beginning).
Post Scriptum
Why 4?! Well, because the other issues of Null Hypothesis are available on Medium. Enzoi.