How Machine Learning is transforming Supply Chain( Part-3)-  ARIMA( Fremont Bridge Seattle Cycle count)
Photo from Puget Sound Business Journal

How Machine Learning is transforming Supply Chain( Part-3)- ARIMA( Fremont Bridge Seattle Cycle count)

What is all in this article ?

  1. What is ARMA , ARIMA & SARIMA ? ARIMA is not a ML model but out of all classical demand planning algorithms this is the one which closely resembles it .
  2. Application of ARIMA to Fremont Bridge Cycle count data to predict Cycle traffic
  3. Can ARIMA really predict stock prices ? We apply it to one of companies (registered with BSE) share price .
  4. Way Forward

Data

The Fremont Bridge Bicycle Counter ( Pic above ) began operation in October 2012 and records the number of bikes that cross the bridge using the pedestrian/bicycle pathways. I?have never been to this bridge but many experts in ML world ( Introduction to Machine Learning with Python: Andreas and Sarah, Python Data Science Handbook: Jake Vanderplas) have looked at this data to explain the complexity involved in ML ?. Obviously what we will do here would be different ! You can access this data from below link –

I would like to thank the city authorities to make this data available for ML enthusiasts like me .

Approach and Pre-requisites

Unlike previous articles in this series , I would keep the Python codes visible so that you can follow along . You must have Jupyter notebook ( or any similar notebook ) installed in your PC . If not then browse on internet and you can get it free ! Last requirement is time commitment . ARIMA is not simple like smoothing models so it would be long article which I would try to keep concise . That is it ! Are you ready learn something new ?

Getting into the MAZE !

Downloaded file when read through Pandas gives few basic details .

No alt text provided for this image
No alt text provided for this image

We clearly see that data is captured in hourly buckets . There are two side walks . We will be interested in total number . A simple plot of data ( you can do it yourself ) confirms that we have records since 2012 . A demand planning mind would also observe seasonality ( which we explore in details later) and decline in trend beginning 2020 . Decline is on account of Covid but there is more interesting details hidden . Let us move forward to discover it .

No alt text provided for this image

Above chart is based on hourly data . What about if weekly sum plotted for entire horizon ? In order to do so we will have to resample the data and then plot it .

No alt text provided for this image
No alt text provided for this image

As we start rolling up data to lower frequency, we can clearly get many insights . Looking at monthly data , it appears like despite Covid cases stabilizing Vol is nowhere where it used to be before Covid ! We cannot compare 2022 with 2021 .

Will it help if we look at weekdays pattern ? I mean how many people on average pass through this bridge on Monday , Saturday etc ? Let us look at 2 charts ( Pre-Covid and during Covid )

No alt text provided for this image
No alt text provided for this image

Very interesting charts . Pre-Covid , weekends witnessed low traffic compared to working days signaling the fact that many office goers opted for cycling to travel to work centers ( This is my assumption which we will prove or reject through data ) . During Covid as such there is no such stark difference but three days ( Tue , Wed and Thursday ) are having higher number of people using cycles . In US , most corporates have made it mandatory to work 3 days from office and it appears like salaried workers are opting to be in office on these 3 weekdays !

How about hourly split ( again Pre and During Covid ) ?

No alt text provided for this image
No alt text provided for this image

Hourly pattern has not changed significantly ! Still many people pass through the bridge during early rush hours and in evening around 5pm . It is fair to assume that because of WFH and 3 days dictated presence in office , we have slight different trend during the Covid. As a city dweller if I need to avoid traffic congestion then hourly plot can be my guide .

Is there any monthly pattern ? We know from above charts that it is there but not obvious .


No alt text provided for this image

People use cycling more often in summers . You can try pre and during Covid graphs yourself .

Let us stop here . There is no end to data analytics and we have already got enough insights to embark on our 2nd leg of long journey . You will be scratching your head and complaining that objective was to learn ARIMA but where is it ? Understanding data is key to modeling . Trust me if we directly jump to ARIMA then it would become even more challenging to unlock it .

Introducing ARIMA ..

ARIMA stands for autoregressive integrated moving average. Each of these terms have some hidden meaning and which is key to understand the ARIMA model itself .

AR( Auto Regressive) + Integrated + (Moving Average) MA ==ARIMA

Before we get into details of AR and MA models , it is important to tell at very onset that it is a linear model . If you have studied math in high school then Linear model is something which you should be familiar with . If not then let us quickly recap it . Ticket price of flight depends on many factors ( Like route (R) , seasons of month(S) , week day(D) , flight operator(F) , class (C), duration of Journey(H) etc ) . We can safely assume that Price at a given time t is linear function of these variables and some error term ( normally distributed ) .

Price(t) == beta + alpha_1*R +alpha_2*S +alpha_3*D + alpha_4*F +......+e

It may be that Linear model does not match the actual function which can represent the distribution of Price data but these are very powerful models . By transforming features ( SK learn Polynomial features) , we can even create non-linear decision boundaries . Anyway , let us get back to ARIMA because that is a topic which itself require separate discussion .

AR Model- It anticipates series dependence on its own past values . What does it mean ?AR models regress on actual past values. Allow me to put a mathematical expression and I promise it is very simple one . So we say that for first order AR model (?AR(p=1))?formula looks like this -

??(?? )=??0 + ??1y(???1)+??
y(Predicted) = E(y(t)) # E stands for expected value                        y(Predicted) = ??0 + ??1y???1        

If AR model considers just pervious observation ( Lag 1) then it is of order 1 ( denoted universally by p) . Dependence on last 2 previous values would mean that order is 2 ( p=2).

??(??) =??0 + ??1??(???1)+??2??(???2)+??(??)

So above formula factors in 2nd order AR(p=2) model . Similarly 3rd order considers last 3 values and so on. It is to time to see this in action .

Step-1 -Creating a sample data ( AR model with p=1)

Basically below formula is implemented with NumPy . 
??(?? )=??0 + ??1y(???1)+??

We have generated 1000 data points where ??1 is 0.7 . We have also introduced some random noises . Looking at graph , can we say that it is actually not a random walk rather an AR model with p=1!        
No alt text provided for this image

Step-2 - Given this distribution of data and also the fact that it is generated by AR model with p=1 , let us use statsmodels ARIMA model to find the values of ??1.

ARIMA applied to data as generated by us

Wow! What is this . Let us break it down so that you can understand the relevant details at this point whereas we revisit some of these terms in later part of this article . When we apply ARIMA model then we have to provide the values of p( order of AR model, which you should understand now ) , d ( number of times series should be differentiated to make it stationary) and q( Order of MA model-to be explained later ) ( with assumption that there is no seasonality , if time series is seasonal then we need to also fill in Seasonal orders -P,D,Q and s, to be explained later) . Since we generated the data so we know that it is AR model with p=1 .

Summary say Model -ARIMA ( 1,0,0) . Nothing surprising as we input it . Next look at value of ??1( ar.L1 ==0.69) . Eureka , model has found almost exact value of ??1. Loglikelihood, AIC , BIC are important data which we learn to read later . As of know the message is Model is able to find the value of parameters of distribution function which we created ourselves .

In real world we would not know the value of p in advance so we should have a way to find it . One of tools available is PACF ( Partial Auto-Correlation Factor) . What is PACF and further details are beyond the scope of this article . Request you to refer online sources .

When plotted it looks like the below chart. The area highlighted in blue tells that if values are within this range then observed partial co-relation is not statistically significant . x axis is lags where as y represents the value of PACF . We can easily notice that only lag 0 and lag 1 values are significant which means that this time series is generated by p=1 AR model! Once we know it , we can go back to ARIMA model and update values of orders .

No alt text provided for this image

To summarize , what we learned here is - What are AR models , if TS data is given ,how can we find the value of order of p ( By using PACF ) . PACF may not always work but it definitely shows us the right direction . We will apply PACF to find order of p for Freemont cycling Time series data . If you want then you can generate 2nd order AR model yourself and check that ARIMA works or not . Is PACF helping to find order of p or not ! In next update we talk about MA model . It is not the moving average that you may be familiar with ....

要查看或添加评论,请登录

Ravi Prakash的更多文章

  • BERT -The Bahubali powering Google search Engine

    BERT -The Bahubali powering Google search Engine

    Try recalling the famous opening theme song of the TV series The Big Bang Theory. It goes like .

    1 条评论
  • Predicting the future of Heath care

    Predicting the future of Heath care

    In next 5 to 15 years AI powered Robots would be the first doctors who will attend to you during hospital visits. A…

  • Demand Segmentation - Simplified

    Demand Segmentation - Simplified

    For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…

  • Rise of chat GPT ( Part-1)

    Rise of chat GPT ( Part-1)

    There are few events in history of human evolution which propelled us into new era and arrival of GPTs ( Generative…

    1 条评论
  • Time Series Forecasting with RNN, LSTM and SARIMA

    Time Series Forecasting with RNN, LSTM and SARIMA

    It is highly probable that you might have been introduced to some of these cryptic words( like RNN etc.) during any…

  • An Epic Journey of Exponential Smoothing ( Part-1)

    An Epic Journey of Exponential Smoothing ( Part-1)

    Forecasting has always allured human beings. Priestesses of Delphi delivered prophecy in ancient Greece after being…

    1 条评论
  • AI battle -Bard vs ChatGPT

    AI battle -Bard vs ChatGPT

    Have you seen the Hugh Jackman starring science fiction sports drama movie Real Steel ? If not then it is worth your…

    2 条评论
  • Curse of Dimensionality and PCA!

    Curse of Dimensionality and PCA!

    Have you ever tried to visualize a space with more than 3 dimensions? It is really hard on mind. In world of Machine…

  • Demand Segmentation with ML models

    Demand Segmentation with ML models

    For supply chain professionals demand segmentation is a very familiar word . If you have worked in any organization as…

  • Black Magic method of Demand Planning

    Black Magic method of Demand Planning

    Le Verrier ( French Astronomer) began studying the motion of Mercury (during 1843) and published a report . In 1859, Le…

    1 条评论

社区洞察

其他会员也浏览了