Practical Time Series Analysis Review
SUNY - State University of New York
Many of us are "accidental" data analysts. We trained in the sciences, business, or engineering and then found ourselves confronted with data for which we have no formal analytical training. Professionals motivated by the constant search of profounder answers will enjoy the subjects and analysis proposed by Tural Sadigov & William Thistleton on the Practical Time Series Analysis Course - offered by The State University of New York - they turn into a pleasant journey, from top to bottom, reviewing the basics of statistics untill the most complex concepts without losing sense of humor.
The point in writing this article is to share knowledge and quality tools to be implemented on forecasting inside companies with time series. Despite my love by the topic and the importance on generating reliable measures while managing business information. Isn't hard to find critics on data driven metrics, the concepts behind a scientific approach demands much more caution and readings, and very often can contradict beliefs in the roots of managers.
To understand how we can correctly fit a Triple Exponential Smoothing, or a SARIMA model by hand AND by software will not change nobody's life, not withstanding brings a complementary view for table of the whom are analysing. To understand what is going on under the packages and assumptions, to permit to better choose a model, to better evaluate an answer, and because of it, better explain and forecast a fenomena.
The course navegate through the architecture of R Programming facilitated by simple examples and the Statistical Scientific Approach, introducing the development of matricial calculus and programming, on behalf of Yule-Walker Equations, which I believe is the key for implementation of time series forecasting methods presented.
The topic of each chapter have parallel explanations going on: one side, the scientifical logic that brings us the reasons why theories and methods exists and, the other side, a soft technical approach, not necessarilly ready to use, will require more readings. But for the ones that come up to learn and implement solutions on the daily basis the most amazing part of this course rise from the clear explanations about how we must interpret Noise, Innovations (new distortions on future values), Trends and Seasonalities just like a musical composition - multiple instruments that comes to play one song - existing innerent in the time series.
Topics of the course:
- Data Visualization
- Basic Statistics Review
- Noise Versus Signal
- Random Walk vs Purely Random Process
- Time plots, Stationarity, ACV, ACF, Random Walk and MA processes
- Stationarity
- Series, Backward Shift Operator, Invertibility and Duality
- AR(p) and the ACF
- Difference equations and Yule-Walker equations
- Partial Autocorrelation
- Yule-Walker in matrix form and Yule-Walker estimation
- AIC and model building
- ARMA Processes
- ARIMA and Q-statistic
- SARIMA processes
- Forecasting
A Practical Example
There is no point on studying such material without a practical approach and one of the most challenging parts of this Data Science View is to implement solutions. A solution can be a forecasting model that can have it's automated update. Nowadays we could take a statistical fitted model in a specific study and transform it in a Machine Learning Project that comprise the implementation of the model running live connected to an ERP, database. or systems of data management cloud based as AWS.
The importance of a good model is the correct approach to the problems, while it's a Time Series Analysis, we can think of variance controlling, neutralization of trends and seasonalities, the three systemic understandings components that one must master to correctly fit this models.
Variance controlling is the understanding if the variance on the time series is going up and down in different moments and the technical approach to make its regularization, like we will see a "Log-Return calculus", while constructing the model, because of the effect of a logaritmic approach that stabilizes the variation - homoschedasticity.
Neutralize the trend on a Time Series is basically understand if there is any systematic increase, or decrease, when time is evolving. If there is a trend, the technical method to be used is to calculate the differenciation over the dataset, which will bring the values to a central point from the beggining to the end of the Time Series.
Seasonality can be seen as trends that constantly repeat themselves with regular periodicity, quarterly, semestraly, annualy, it responds to a variation with specific seasons in a year, month, or week. The treatment given to the data have the addition of variables, specifically approaching this kind of variation, bringing inside the calculus the important innovations that remains every step we take forward through the time series.
Isn't my objective here present the theoreticals of the models that the course comprise, the course can be took by everyone. But, I want to show a piece of the possibilities that come up when aquiring this kind of knowledge.
Here I will evaluate a model to predict a Time Series with a SARIMA model - SEASONAL AUTOREGRESSIVE INTEGRATED MOVING AVERAGE Model - that comprise all the concepts explicited above. The dataset used was downloaded from the Time Series Data Library and the data comprise the sales from a souvenir shop in Australia.
CODE AND RUN
Read the data on R Studio or Jupyter Notebook:
- SUV <- read.csv('monthly-sales-for-a-souvenir-sho.csv') # here it is in my notebook
Transform the dataset on a Time Series:
- suv <- ts(SUV$Sales)
Call the analytical packages:
- library(astsa)
- library(forecast)
Create plots to start the components analysis:
- par(mfrow=c(2,2))
- plot(suv, main='Monthly sales for a souvenir shop', ylab='', col='blue', lwd=1)
- plot(log(suv), main='Log-transform of sales', ylab='', col='red', lwd=1)
- acf(suv, main = "ACF")
- pacf(suv, main = "PACF")
OUTPUT:
On the Monthly Sales of the Souvenir Shop plot we can see that there is a clear increase in variance/variation, heteroschedasticity, of the peaks presented by time serie, for this purpose we will use the Log-Transformation to consistently adjust this variance. On the Log-Transform plot we still see that there is a strong trend going on, which must be treated.
On the ACF - Autocorrelation Function - we can analyse the interaction of Moving Averages that are intrinsecally in the not-treated data. Every spike in the plot we name a "lag" and the these lags will support our decisions about which model to be choose. The Moving Average capture the innovations extensions to future periods, disturbances, the information that is kept from a specific period to the future ones. As this waves drawn in the graph by the spykes we can assume that there is seasonality going on the time series, which seems to be of 12 periods and repeat - a great spyke at lag 1 and a great spyke at lag 12 fading slowly without abrupt cuts.
The PACF - Partial Autocorrelation Function - we analyse the Autoregressive characteristics present in the time series and the measure of the autocorrelation brings us the information kept from one time period to another, information that we want to model to be able to extract a good forecast, and information that we must measure if it exists, that will guide us to choose the best model.
Create plots to start the component analysis of a treated Souvenir TS:
- par(mfrow=c(2,2))
- plot(diff(log(suv)), main='Differenced Log-transorm of sales', ylab='', col='brown', lwd=1)
- plot(diff(diff(log(suv)),12), main='Log-transform without trend and seasonaliy', ylab='', col='green', lwd=1)
- acf(diff(diff(log(suv))), main = "ACF of Diff Log-transform")
- pacf(diff(diff(log(suv))), main = "PACF of Diff Log-trnasform")
Let's talk about the Log-Transform without trend and seasonal effect plot, we're seeing a Time Series that have been treated with Log approach with two differenciations, which brought stability and better measure of variability - homoschedasticity, that affect us by the need of stationarity in the TS - absense of trend and seasonality to garantee that our model works as we need.
At the ACF we can test the Moving Average part of out model, built in two parts: Moving Average (MA) and Seasonal Moving Average (SMA). We can analyse the plot as a possibility of MA being 1, or 2, and a SMA = 1.
By the PACF we infer that our Autoregressive (AR) part can be 1, 2, 3, or 4, and our Seasonal Autoregressive (SMA) part as 1.
This numbers are the model building process and they follow the Parsimony Principle - we choose a model that isn't the necessarily the best, if the gain from simplicity is major than the gain in explainability. It means that we don't want to overfit out the TS, the model.
You can see that we haven't yet applied any statistical process of forecasting, the building process explicited above comprise a model called SARIMA, very powerful and with long time of experimentation by the scientific community. This approach is used on a variety of firms that depends much of the creativity and technical level of financial teams.
Defining Differencing components and periods:
- d = 1
- DD = 1
- per = 12
Building a loop to analyse multiple models:
- for(p in 1:2){ #AR component
- for(q in 1:2){ #MA component
- for(i in 1:2){ #Matricial calculus [i, ]
- for(j in 1:4){ #Matricial calculus [ ,j]
- if(p+d+q+i+DD+j<=10){ #model components (p, d, q, DD)
- model<-arima(x=log(suv), order = c((p-1),d,(q-1)), seasonal = list(order=c((i-1),DD,(j-1)), period=per))
- pval<-Box.test(model$residuals, lag=log(length(model$residuals))) #Test for residuals analysis
- sse<-sum(model$residuals^2) # Sum Squared Errors Metric for quality of estimation
- cat(p-1,d,q-1,i-1,DD,j-1,per, 'AIC=', model$aic, ' SSE=',sse,' p-VALUE=', pval$p.value,'\n')
- }}}}}
OUTPUT:
0 1 0 0 1 0 12 AIC= -11.60664 SSE= 3.432906 p-VALUE= 0.0001365566 0 1 0 0 1 1 12 AIC= -16.09179 SSE= 2.97756 p-VALUE= 3.149952e-05 0 1 0 0 1 2 12 AIC= -17.58234 SSE= 2.301963 p-VALUE= 0.0002456591 0 1 0 0 1 3 12 AIC= -16.41016 SSE= 2.35266 p-VALUE= 0.0003392283 0 1 0 1 1 0 12 AIC= -13.43083 SSE= 3.214065 p-VALUE= 4.083839e-05 0 1 0 1 1 1 12 AIC= -17.76362 SSE= 2.399746 p-VALUE= 0.0001916565 0 1 0 1 1 2 12 AIC= -15.99095 SSE= 2.349898 p-VALUE= 0.0002477783 0 1 0 1 1 3 12 AIC= -14.74777 SSE= 2.302026 p-VALUE= 0.0004504596 0 1 1 0 1 0 12 AIC= -27.78538 SSE= 2.643277 p-VALUE= 0.1742478 0 1 1 0 1 1 12 AIC= -34.54538 SSE= 2.233424 p-VALUE= 0.2730783 0 1 1 0 1 2 12 AIC= -33.6145 SSE= 2.109473 p-VALUE= 0.2830597 0 1 1 0 1 3 12 AIC= -32.19273 SSE= 1.87789 p-VALUE= 0.270042 0 1 1 1 1 0 12 AIC= -32.33192 SSE= 2.360507 p-VALUE= 0.2584529 0 1 1 1 1 1 12 AIC= -34.0881 SSE= 1.842013 p-VALUE= 0.2843225 0 1 1 1 1 2 12 AIC= -32.1017 SSE= 1.856343 p-VALUE= 0.28516 1 1 0 0 1 0 12 AIC= -27.07825 SSE= 2.6747 p-VALUE= 0.2297852 1 1 0 0 1 1 12 AIC= -34.98918 SSE= 2.209442 p-VALUE= 0.4633806 1 1 0 0 1 2 12 AIC= -33.38623 SSE= 2.159413 p-VALUE= 0.4515457 1 1 0 0 1 3 12 AIC= -31.54519 SSE= 2.121635 p-VALUE= 0.4390829 1 1 0 1 1 0 12 AIC= -32.64858 SSE= 2.340077 p-VALUE= 0.4022223 1 1 0 1 1 1 12 AIC= -33.48894 SSE= 2.125764 p-VALUE= 0.4442665 1 1 0 1 1 2 12 AIC= -31.52137 SSE= 2.093124 p-VALUE= 0.4463098 1 1 1 0 1 0 12 AIC= -26.17089 SSE= 2.624281 p-VALUE= 0.2507442 1 1 1 0 1 1 12 AIC= -33.30647 SSE= 2.201798 p-VALUE= 0.4110138 1 1 1 0 1 2 12 AIC= -31.68924 SSE= 2.151774 p-VALUE= 0.3820812 1 1 1 1 1 0 12 AIC= -31.10127 SSE= 2.323818 p-VALUE= 0.3492746 1 1 1 1 1 1 12 AIC= -32.69913 SSE= 1.824189 p-VALUE= 0.3091943
Let's take a closer look at what we understand to be each column:
- Components of the model (p, d, q, P, D, Q)s, where (p, d, q) are non-seasonal components and (P, D, Q)s are seasonal components;
- AIC - AKAIKE Information Criteria : is a indicator of the quality of the model and penalizes too complicated models. AIC is our main criteria to model selection;
- SSE - Sum Squared Error : when used to choose a model, what we want is the model with less error possible, without overfitting;
- p-VALUE : we want a high p-value to confirm our hyphotesis of non-autocorrelation within the residuals.
Now that we know the parameters to choose our model let's see the result:
Model: SARIMA (1,1,0,0,1,1)12
1 1 0 0 1 1 12 AIC= -34.98918 SSE= 2.209442 p-VALUE= 0.4633806
Brought us the best result taken in consideration to the parsimony principle for the simplest model.
FIT SARIMA MODEL:
- model<- arima(x=log(suv), order = c(1,1,0), seasonal = list(order=c(0,1,1), period=12))
PLOT THE FORECAST:
- plot(forecast(model))
OUTPUT
OUTPUT THE VALUES:
- forecast(model)
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95 85 9.600019 9.373968 9.826071 9.254303 9.945736 86 9.786505 9.533944 10.039066 9.400246 10.172764 87 10.329605 10.025423 10.633786 9.864399 10.794810 88 10.081973 9.746705 10.417240 9.569225 10.594720 89 10.008096 9.638604 10.377587 9.443007 10.573184 90 10.181170 9.783094 10.579245 9.572365 10.789974 91 10.439372 10.013362 10.865383 9.787845 11.090900 92 10.534857 10.083237 10.986477 9.844164 11.225551 93 10.613026 10.136886 11.089165 9.884833 11.341218 94 10.664526 10.165207 11.163846 9.900883 11.428170 95 11.096784 10.575248 11.618321 10.299163 11.894406 96 11.877167 11.334355 12.419979 11.047007 12.707326 97 9.932756 9.330373 10.535139 9.011491 10.854022 98 10.112194 9.475681 10.748707 9.138731 11.085657 99 10.658829 9.980844 11.336814 9.621940 11.695718 100 10.409423 9.696788 11.122058 9.319542 11.499304 101 10.336437 9.588666 11.084207 9.192821 11.480052 102 10.509064 9.728749 11.289378 9.315676 11.702452 103 10.767491 9.955449 11.579532 9.525580 12.009401 104 10.862863 10.020524 11.705202 9.574617 12.151109 105 10.941088 10.069390 11.812786 9.607940 12.274235 106 10.992560 10.092516 11.892605 9.616061 12.369060 107 11.424833 10.497280 12.352385 10.006263 12.843402 108 12.205208 11.250954 13.159461 10.745803 13.664613
The values forecasted by the model are divided by the probabilities of occurrence, first the red line represents the POINT FORECAST. Secondly, are shown the 80% probability result (Hi 80 ; Lo 80) represented by the darker grey area. Last, we have the 95% probability of ocurrence result (Hi 95 ; Lo 95). It is important to understand that the probability of achieving numbers as metrics of forecasting are the same of used in the Budgeting Process, which commonly is given in a fixed value, what is erroneous in a way of prediction analysis, where would be better predict the scenarios.
Fit model for residual analysis:
- model <- sarima ( log(suv), 1,1,0,0,1,1,12 )
Call the residual analysis:
- model$ttable
The image above shows the Standardized Residuals that looks like White Noise, the ACF don't show significance - don't cut the blue dashed line, the Q-Q Plot shows a slightly departure from normality on the tails of the residuals dispersion and on the Ljung-Box shows that no point have significantly small p-value. This means that our model is coherently built and have aleatory residuals.
Where Expectations can be Nurtured?
With this article I tried to share some knowledge and a way to acquire it. Isn't trivial build expertise on Forecasting methods and either is to add to the debate with reliable information. Companies are full of opinions and non-proved theories about what drives it's accomplishments and, I think, remains to the data scientists to analyse and search for evidences of such arguments, the denial of the hyphotesis testing.
While developing metrics and analysis, most companies need to automate processes and the operationalization of Budget and Forecasting isn't an easy task that requires a lot of expertise to achieve results, because while all of this models are being implemented by the firms, they are changing the way it works and the accumulation of this shared expertise through data driven critics tends to horizontaly be disseminated.
Here presented a SARIMA model that is a mixed model, the evolution of two different models Moving Averages and Autorressive models. Nowadays studies have been made about the use of these models simultaneously with other types and Machine Learning options, what states the reason I took the course at first place - to review the knowledge about the construction of such models. You can check the M-4 competition for studies of this kind and the best simulation models achieved. You can check also the blogs from Uber Engineering and Amazon Engineering for more practical examples. Inside these companies the parametric models, as used here, are being used for local forecasting, while Neural Networks and Machine Learning types are being used to glue and compile the results to a Macro model, a mixed model approach, achieving better accuracy.
There is an universe to explore and the implementation of forecasting models must be made by very detailed professionals to guarantee that the tests results in reliable information for the firms, multiplying the understanding of the business without black boxes. For this purpose this course have been a great use of my time. Ans as an professor once stated "test your models exhaustively, because once it's running is your neck they will pull if it fails".
Experimentation is the key for innovation, be creative and hungry for knowledge, let curiosity be your guide.
Cordially,
Willians Bernardino
Economist, Master Candidate in Economics and Computational Econometrics at UFRGS. Researcher for the e-Compfin at UFRGS, Computational Finance Group. FP&A Specialist - MBA, by FGV-Mngmnt. Works at thyssenkrupp Elevator Financial Control Team Latin America.
Developer || Building ML solutions and pretending It’s not magic
3 年Thank you for pointing me to this course. And also for a great overview :-)
Advogado na Mudrovitsch Advogados | Mestre em Economia (UFRGS)
4 年Great work!