How to Measure Winning Streaks (and the Improve Your Forecasts) in Python + R
Image from Paddy Simpson: https://youtu.be/0Zj_9ypBnzg

How to Measure Winning Streaks (and the Improve Your Forecasts) in Python + R

Measuring how long something is happening and the time between events is extremely useful for a wide range of data and decision science applications. For example, you can characterise and compare performance of airlines and airports by finding the longest streak of on-time flights. It is also useful if you are wanting to characterise the probability distribution of some event of interest happening within a designated period of time. For example, I recently created a demo app that uses Monte Carlo simulation to predict the chances of a group of friends all arriving within a certain period of time assuming their expected arrival times are independent and normally distributed.

In this example we will be characterising the length of consecutive gains in share price for Apple over a ten year period using data extracted using Matt Dancho's tidyquant wrapper for the Yahoo API in R. The probability distribution of length of streak is then visualised to show how the chances of longer and longer streaks decreases exponentially.

It turns out that counting streaks is a little tricky when using either pandas in Python or the tidyverse in R. It involves scoring successive closing prices as up or down and then leveraging cumulative sums and differences to engineer a column that keeps a day by day measure of the winning streak.

get streak of consecutive days of increasing share price in R
No alt text provided for this image

Let's now see how to do the same job in Python. This solution might look a little more elegant than my R code but is basically achieving the same manipulation through a series of chained methods.

get winning streaks using Python

And we can use the following code to plot the probability distribution for consecutive days of increasing share price.

Plot the probability distribution for share price streaks in Python
probability distribution for Apple share price increases

So every night when you watch the finance gurus talk about the market going up or down today you might have a better idea of how likely something you are seeing might be. For the Apple share price over the last 10 years if it went up yesterday, it will most likely to go down today. Yet every day some talking head is suggesting possible reasons for why the market went up or down on a day to day basis. This is most often pure fiction.

The Data Science Code in Python + R newsletter is your source for weekly guides on how to do one data science job in Python and R that can be read in 5 minutes or less. If you have any requests for topics you would like to see covered in future editions please mention them in the comments below.

Ludek Stehlik, Ph.D.

People & Data Scientist @Sanofi

2 年

How is it possible that the chances of everyone arriving at a given time don't change with how certain I am of the estimated average time it will take each friend to get there? A bug in the app, or did I miss something?

Mitchell Davidson

Senior Plant Metallurgist at BHP Carrapateena

2 年

Friend 9 is so unreliable! We all can relate to a friend 9!

  • 该图片无替代文字
Matt Rosinski

Senior Data Scientist | Business Insights | Causal AI

2 年

You can check out the Shiny app here: https://lnkd.in/ghhcJ4Y8

要查看或添加评论,请登录

Matt Rosinski的更多文章

社区洞察

其他会员也浏览了