登录查看更多内容

How Machine Learning is transforming Supply Chain( Part-3)- ARIMA( Fremont Bridge Seattle Cycle count)

Ravi Prakash

Senior Manager , Planning and Business Systems , Johnson and Johnson , APAC , MedTech

发布日期: 2022年7月31日

+ 关注

What is all in this article ?

What is ARMA , ARIMA & SARIMA ? ARIMA is not a ML model but out of all classical demand planning algorithms this is the one which closely resembles it .
Application of ARIMA to Fremont Bridge Cycle count data to predict Cycle traffic
Can ARIMA really predict stock prices ? We apply it to one of companies (registered with BSE) share price .
Way Forward

Data

The Fremont Bridge Bicycle Counter ( Pic above ) began operation in October 2012 and records the number of bikes that cross the bridge using the pedestrian/bicycle pathways. I?have never been to this bridge but many experts in ML world ( Introduction to Machine Learning with Python: Andreas and Sarah, Python Data Science Handbook: Jake Vanderplas) have looked at this data to explain the complexity involved in ML ?. Obviously what we will do here would be different ! You can access this data from below link –

I would like to thank the city authorities to make this data available for ML enthusiasts like me .

Approach and Pre-requisites

Unlike previous articles in this series , I would keep the Python codes visible so that you can follow along . You must have Jupyter notebook ( or any similar notebook ) installed in your PC . If not then browse on internet and you can get it free ! Last requirement is time commitment . ARIMA is not simple like smoothing models so it would be long article which I would try to keep concise . That is it ! Are you ready learn something new ?

Getting into the MAZE !

Downloaded file when read through Pandas gives few basic details .

We clearly see that data is captured in hourly buckets . There are two side walks . We will be interested in total number . A simple plot of data ( you can do it yourself ) confirms that we have records since 2012 . A demand planning mind would also observe seasonality ( which we explore in details later) and decline in trend beginning 2020 . Decline is on account of Covid but there is more interesting details hidden . Let us move forward to discover it .

Above chart is based on hourly data . What about if weekly sum plotted for entire horizon ? In order to do so we will have to resample the data and then plot it .

As we start rolling up data to lower frequency, we can clearly get many insights . Looking at monthly data , it appears like despite Covid cases stabilizing Vol is nowhere where it used to be before Covid ! We cannot compare 2022 with 2021 .

Will it help if we look at weekdays pattern ? I mean how many people on average pass through this bridge on Monday , Saturday etc ? Let us look at 2 charts ( Pre-Covid and during Covid )

Very interesting charts . Pre-Covid , weekends witnessed low traffic compared to working days signaling the fact that many office goers opted for cycling to travel to work centers ( This is my assumption which we will prove or reject through data ) . During Covid as such there is no such stark difference but three days ( Tue , Wed and Thursday ) are having higher number of people using cycles . In US , most corporates have made it mandatory to work 3 days from office and it appears like salaried workers are opting to be in office on these 3 weekdays !

How about hourly split ( again Pre and During Covid ) ?

Hourly pattern has not changed significantly ! Still many people pass through the bridge during early rush hours and in evening around 5pm . It is fair to assume that because of WFH and 3 days dictated presence in office , we have slight different trend during the Covid. As a city dweller if I need to avoid traffic congestion then hourly plot can be my guide .

Is there any monthly pattern ? We know from above charts that it is there but not obvious .

领英推荐

Four Machine Learning Questions that Every Data…

Benjamin Bennett Alexander 4 周前

Data Science Machine Learning Full Stack Roadmap??

Himanshu Ramchandani 1 年前

Top 12 Python Skills Every Data Scientist Should Learn

Shailendra Chauhan 2 个月前

People use cycling more often in summers . You can try pre and during Covid graphs yourself .

Let us stop here . There is no end to data analytics and we have already got enough insights to embark on our 2nd leg of long journey . You will be scratching your head and complaining that objective was to learn ARIMA but where is it ? Understanding data is key to modeling . Trust me if we directly jump to ARIMA then it would become even more challenging to unlock it .

Introducing ARIMA ..

ARIMA stands for autoregressive integrated moving average. Each of these terms have some hidden meaning and which is key to understand the ARIMA model itself .

AR( Auto Regressive) + Integrated + (Moving Average) MA ==ARIMA

Before we get into details of AR and MA models , it is important to tell at very onset that it is a linear model . If you have studied math in high school then Linear model is something which you should be familiar with . If not then let us quickly recap it . Ticket price of flight depends on many factors ( Like route (R) , seasons of month(S) , week day(D) , flight operator(F) , class (C), duration of Journey(H) etc ) . We can safely assume that Price at a given time t is linear function of these variables and some error term ( normally distributed ) .

Price(t) == beta + alpha_1*R +alpha_2*S +alpha_3*D + alpha_4*F +......+e

It may be that Linear model does not match the actual function which can represent the distribution of Price data but these are very powerful models . By transforming features ( SK learn Polynomial features) , we can even create non-linear decision boundaries . Anyway , let us get back to ARIMA because that is a topic which itself require separate discussion .

AR Model- It anticipates series dependence on its own past values . What does it mean ?AR models regress on actual past values. Allow me to put a mathematical expression and I promise it is very simple one . So we say that for first order AR model (?AR(p=1))?formula looks like this -

??(?? )=??0 + ??1y(???1)+??
y(Predicted) = E(y(t)) # E stands for expected value                        y(Predicted) = ??0 + ??1y???1

If AR model considers just pervious observation ( Lag 1) then it is of order 1 ( denoted universally by p) . Dependence on last 2 previous values would mean that order is 2 ( p=2).

??(??) =??0 + ??1??(???1)+??2??(???2)+??(??)

So above formula factors in 2nd order AR(p=2) model . Similarly 3rd order considers last 3 values and so on. It is to time to see this in action .

Step-1 -Creating a sample data ( AR model with p=1)

Basically below formula is implemented with NumPy . 
??(?? )=??0 + ??1y(???1)+??

We have generated 1000 data points where ??1 is 0.7 . We have also introduced some random noises . Looking at graph , can we say that it is actually not a random walk rather an AR model with p=1!

Step-2 - Given this distribution of data and also the fact that it is generated by AR model with p=1 , let us use statsmodels ARIMA model to find the values of ??1.

ARIMA applied to data as generated by us

Wow! What is this . Let us break it down so that you can understand the relevant details at this point whereas we revisit some of these terms in later part of this article . When we apply ARIMA model then we have to provide the values of p( order of AR model, which you should understand now ) , d ( number of times series should be differentiated to make it stationary) and q( Order of MA model-to be explained later ) ( with assumption that there is no seasonality , if time series is seasonal then we need to also fill in Seasonal orders -P,D,Q and s, to be explained later) . Since we generated the data so we know that it is AR model with p=1 .

Summary say Model -ARIMA ( 1,0,0) . Nothing surprising as we input it . Next look at value of ??1( ar.L1 ==0.69) . Eureka , model has found almost exact value of ??1. Loglikelihood, AIC , BIC are important data which we learn to read later . As of know the message is Model is able to find the value of parameters of distribution function which we created ourselves .

In real world we would not know the value of p in advance so we should have a way to find it . One of tools available is PACF ( Partial Auto-Correlation Factor) . What is PACF and further details are beyond the scope of this article . Request you to refer online sources .

When plotted it looks like the below chart. The area highlighted in blue tells that if values are within this range then observed partial co-relation is not statistically significant . x axis is lags where as y represents the value of PACF . We can easily notice that only lag 0 and lag 1 values are significant which means that this time series is generated by p=1 AR model! Once we know it , we can go back to ARIMA model and update values of orders .

To summarize , what we learned here is - What are AR models , if TS data is given ,how can we find the value of order of p ( By using PACF ) . PACF may not always work but it definitely shows us the right direction . We will apply PACF to find order of p for Freemont cycling Time series data . If you want then you can generate 2nd order AR model yourself and check that ARIMA works or not . Is PACF helping to find order of p or not ! In next update we talk about MA model . It is not the moving average that you may be familiar with ....

要查看或添加评论，请登录

Ravi Prakash的更多文章

BERT -The Bahubali powering Google search Engine

2025年2月8日

BERT -The Bahubali powering Google search Engine

Try recalling the famous opening theme song of the TV series The Big Bang Theory. It goes like .

1 条评论
Predicting the future of Heath care

2024年11月29日

Predicting the future of Heath care

In next 5 to 15 years AI powered Robots would be the first doctors who will attend to you during hospital visits. A…
Demand Segmentation - Simplified

2024年10月5日

Demand Segmentation - Simplified

For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…
Rise of chat GPT ( Part-1)

2024年8月18日

Rise of chat GPT ( Part-1)

There are few events in history of human evolution which propelled us into new era and arrival of GPTs ( Generative…

1 条评论
Time Series Forecasting with RNN, LSTM and SARIMA

2024年3月17日

Time Series Forecasting with RNN, LSTM and SARIMA

It is highly probable that you might have been introduced to some of these cryptic words( like RNN etc.) during any…
An Epic Journey of Exponential Smoothing ( Part-1)

2023年12月27日

An Epic Journey of Exponential Smoothing ( Part-1)

Forecasting has always allured human beings. Priestesses of Delphi delivered prophecy in ancient Greece after being…

1 条评论
AI battle -Bard vs ChatGPT

2023年5月20日

AI battle -Bard vs ChatGPT

Have you seen the Hugh Jackman starring science fiction sports drama movie Real Steel ? If not then it is worth your…

2 条评论
Curse of Dimensionality and PCA!

2023年5月6日

Curse of Dimensionality and PCA!

Have you ever tried to visualize a space with more than 3 dimensions? It is really hard on mind. In world of Machine…
Demand Segmentation with ML models

2023年1月28日

Demand Segmentation with ML models

For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…
Black Magic method of Demand Planning

2023年1月7日

Black Magic method of Demand Planning

Le Verrier ( French Astronomer) began studying the motion of Mercury (during 1843) and published a report . In 1859, Le…

1 条评论

See all articles

How Machine Learning is transforming Supply Chain( Part-3)- ARIMA( Fremont Bridge Seattle Cycle count)

Ravi Prakash

Senior Manager , Planning and Business Systems , Johnson and Johnson , APAC , MedTech

领英推荐

Ravi Prakash的更多文章

社区洞察

其他会员也浏览了

Non-linear Functional Data Analysis

Types of Sampling in Machine Learning

Emerging Ecosystem: Data Science and Machine Learning Software, Analyzed

Simple Linear Regression...made simple (Machine Learning Concepts)

Our Data Science Journey: From Zero to Hero ??

Decision Trees: A Guide to Understanding and Building

Top Stories of 2017: 10 Free Must-Read Books for Machine Learning, Data Science; Python overtakes R, becomes the leader in Data Science

A Complete Guide to Principal Component Analysis — PCA in Machine Learning

End to End Movie Recommendation System with Flask app

The Ultimate Roadmap to Becoming a Data Scientist

领英推荐

Ravi Prakash的更多文章

BERT -The Bahubali powering Google search Engine

Predicting the future of Heath care

Demand Segmentation - Simplified

Rise of chat GPT ( Part-1)

Time Series Forecasting with RNN, LSTM and SARIMA

An Epic Journey of Exponential Smoothing ( Part-1)

AI battle -Bard vs ChatGPT

Curse of Dimensionality and PCA!

Demand Segmentation with ML models

Black Magic method of Demand Planning

社区洞察

其他会员也浏览了

Non-linear Functional Data Analysis

Types of Sampling in Machine Learning

Emerging Ecosystem: Data Science and Machine Learning Software, Analyzed

Simple Linear Regression...made simple (Machine Learning Concepts)

Our Data Science Journey: From Zero to Hero ??

Decision Trees: A Guide to Understanding and Building

Top Stories of 2017: 10 Free Must-Read Books for Machine Learning, Data Science; Python overtakes R, becomes the leader in Data Science

A Complete Guide to Principal Component Analysis — PCA in Machine Learning

End to End Movie Recommendation System with Flask app

The Ultimate Roadmap to Becoming a Data Scientist