Null Hypothesis #4: lockdown eyeballs

Null Hypothesis #4: lockdown eyeballs

Hypothesis

"Lockdowns had a positive impact on Netflix's subscriber base"

Well, it may seem quite a trivial question: worldwide lockdowns forced people at home and the need to escape ennui and domestic boredom pushed a lot of people into the arms of digital video platforms like Netflix. Easy, plausible, but still to be demonstrated. Let's see some numbers.

Dataset

Number of Netflix paid subscribers worldwide from 4th quarter 2012 to 4th quarter 2022 (source: Business of Apps, https://www.businessofapps.com/data/netflix-statistics). Here's the small dataset we are using for this little exercise: beside the cumulative number of subscribers (in millions) for each quarter, we have calculated the quarter over quarter growth and added a couple of flags (pandemic flag: before and after the formal declaration by WHO on March 2020; year_flag for discriminating the first year in the pandemic from all the others).

No alt text provided for this image

Analysis

How they say, a picture is worth a thousand words, and below we have 5 plots (and some R code snippets for reproducibility, that you can skip):

No alt text provided for this image

  • PLOT 1: Netflix cumulative customer base, quarter over quarter, in millions. That's a no brainer, but I had a white spot to cover, sorry. ;-P
  • PLOT 2: Netflix customer base growth, quarter over quarter, in millions. We are interested in understanding that nice spike of almost 16 millions on 2020 Q4: can that be considered normal or is that a sort of changepoint in trend dynamics? We'll see after few tests.
  • PLOT 3: if we compare quarter over quarter growth before and after the formal declaration of pandemic by WHO (infamous March 11, 2020), we can see some important changes in median, third quartile and outlier range. Is that relevant? Well, the answer is no: no relevant change in mean before and after WHO declaration (as you may see below with Welch Two Sample t-test) ...

t.test(netflix$growth[netflix$pandemic == "BEFORE DECLARATION"], netflix$growth[netflix$pandemic == "AFTER DECLARATION"])

Welch Two Sample t-tes

data:? netflix$growth[netflix$pandemic == "BEFORE DECLARATION"] and netflix$growth[netflix$pandemic == "AFTER DECLARATION"]
t = -1.431, df = 12.102, p-value = 0.1777
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
?-5.214503? 1.077897
sample estimates:
mean of x mean of y?
?4.335333? 6.403636?
        

... but maybe there 's a catch here. If we compare the average quarter growth during the first year in the pandemic with all other years, well, we see some quite remarkable changes, with the boxplot for 2020 jumping well above in a significant way:

t.test(netflix$growth[netflix$year_flag == "FIRST YEAR IN PANDEMIC"], netflix$growth[netflix$year_flag == "OTHER YEARS"])

Welch Two Sample t-test

data:? netflix$growth[netflix$year_flag == "FIRST YEAR IN PANDEMIC"] and netflix$growth[netflix$year_flag == "OTHER YEARS"]
t = 3.0823, df = 3.2947, p-value = 0.04773
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
? 0.1077993 11.9866602
sample estimates:
mean of x mean of y?
?10.34750? ?4.30027?
        

I know, few data points and a p-value of 0.047 may not be the most rocky ground for supporting any strong hypothesis, so we need to use a little statistical hack (aka, simulation), to better understand if the growth from 2020 Q2 to 2021 Q1 could be considered really a significant leap or not, and that's why we have the next plot.

  • PLOT 4: if we simulate the annual growth a hundred thousand times, we get a sampling distribution useful to understand how remarkable is that spike we saw the first year in the pandemic. Those four quarters Netflix collected new subscribers for an amount of 41.39 millions, comparable to the 99.9% percentile of all possible annual growths so far. Do you need some other proof?

sampling_distribution <- replicate(100000, sum(sample(netflix$growth, 4, replace = T)))

empirical <- edfun::edfun(sampling_distribution)

empirical$qfun(0.95)
30.54

empirical$qfun(0.99)
35.7

empirical$qfun(0.999)
40.46

empirical$pfun(41.39)
0.99963
        

  • PLOT 5: if we subtract the sampling distribution to our testing value (41.39 millions in 2020 Q2 - 2021 Q1), we can get a distribution of potential impacts during the first year in the pandemic. As you see, there is only a 0.037% chance for a negative value, while we have a range of potential impacts from zero to a mode of 22.86 millions and beyond, so ...

Conclusion

... how many Netflix's new subscribers could be attributed to the exceptional conditions lived during lockdowns in 2020? The answer depends on how many "asterisks" you need to ground your statistical confidence:

* : the surplus amounts to an exceeding 10.85 millions (and you have a Type I error of 5%)

**: the surplus amounts to a "more considerated" 5.69 millions (with a Type I risk of 1%)

***: the surplus amounts to a modest 0.93 millions (risk of 0.1%)

Sure, you can decide that 0.1% is unacceptable and you can dig deeper into the 99.999% percentile of the sampling distribution and you'll get even more conservative estimates, but I guess you got the point. The question here is not of how many 9s you want, but our starting hypothesis: a positive impact of the pandemic on the subscription base of Netflix is reasonable if we look at the numbers (and we have a probability of 99.99963% of a positive delta during first year in the pandemic). According to Business of Apps, depending on the specific region, Netflix's ARPU ranges from a minimum of about $7 (Latin America) to a maximum of about $15 (US & Canada), so you can have an general idea of the money we are talking about.

Last but not least: you remember that Netflix recently showed in the news for the first time loss of subscribers in its history. Well, besides the many possible causes (competition, increasing prices, etc.), IMHO that could be considered a correction of what happened in 2020 (and maybe only the beginning).

Post Scriptum

Why 4?! Well, because the other issues of Null Hypothesis are available on Medium. Enzoi.

#datascience #netflix #pandemic #subscribers #viewers #businessanalysis

要查看或添加评论,请登录

Giancarlo Vercellino的更多文章

  • Null Hypothesis #10: Wild assumptions, and educated guesses

    Null Hypothesis #10: Wild assumptions, and educated guesses

    Hypothesis "You can tame your wildest guesses—one node at a time". Bayesian Networks are like the ultimate guessing…

  • The debate of the Century

    The debate of the Century

    I asked ChatGPT to imagine a dialogue among Elon Musk, Yuval Harari and Sam Harris in which they are making fun of each…

  • Null Hypothesis #9: Serial creators

    Null Hypothesis #9: Serial creators

    Hypothesis "Achieving serial success in Kickstarter campaign funding cannot be explained with simple performance…

  • Null Hypothesis #8: Crypto-Quakes or Crypto-Apocalypse?

    Null Hypothesis #8: Crypto-Quakes or Crypto-Apocalypse?

    Hypothesis "The odds for a cryptoquake are not so negligible as you can imagine" Quadriga: April 11, 2019. Voyager:…

    1 条评论
  • Null Hypothesis #7: vertical limit

    Null Hypothesis #7: vertical limit

    Hypothesis "Verticalization strategy saved the bottom line of many IT and Software Providers during the pandemic" This…

  • Null Hypothesis #6: hot air vs "hot air"

    Null Hypothesis #6: hot air vs "hot air"

    Hypothesis "Extreme weather events have a limited impact on climate change awareness" The idea that climate change…

  • Null Hypothesis #5: chasing the Jonases

    Null Hypothesis #5: chasing the Jonases

    Hypothesis "Adoption of new products and technologies has accelerated in the last twenty years" The diffusion of new…

  • Is "Resilience" the right word? An exercise in meme prediction

    Is "Resilience" the right word? An exercise in meme prediction

    The point: can the amplified awareness about Resilience significantly improve the awareness about Cloud? In this…

  • The next ten days

    The next ten days

    Looking to the numbers of Coronavirus epidemic so far, we can understand the serious concern pervading the scientific…

  • How to sell a Cookbook the smart way

    How to sell a Cookbook the smart way

    I know, at first glance the comparison of Artificial Intelligence services to Cookbooks may seem somehow daring, but…

社区洞察

其他会员也浏览了