登录查看更多内容

Forecasting time series: choosing the algorithm to model

Sanchit Tiwari

Associate Partner at McKinsey & Company I Senior Principal at QuantumBlack, AI by McKinsey

发布日期: 2019年1月1日

We all know that predicting time series data is difficult and complex task due to uncertainty related with time and incomplete information around the event and whenever we get any problem related to forecasting time series data we start with traditional techniques which effectively forecast the next lag of time series data such as univariate Autoregressive (AR), univariate Moving Average (MA), Simple Exponential Smoothing (SES), and more particularly Autoregressive Integrated Moving Average (ARIMA) with its many variations as usually it perform better with high accuracy in predicting the next lags of time series. However in recent times with advancement in each area new algorithms are developed to analyze and forecast the time series data. Deep learning technique Long Short-Term Memory networks(LSTMs) is getting applied for time series forecasting and during these holidays I thought to write a detailed article on algorithms for forecasting time series. ARIMA has been a most preferred method for time series forecasting for a long time but ARIMA has some major limitation and before we discuss around limitations let us first understand the basics around ARIMA.

ARIMA:-

ARIMA is a generalized model of Autoregressive Moving Average (ARMA) that combines Autoregressive (AR) process and Moving Average (MA) processes and builds a composite model of the time series.

As acronym suggests, no seasonal ARIMA(p, d, q) has the key elements of the model: –

· AR: Autoregression. A regression model that uses the dependencies between an observation and a number of lagged observations (p).

· I: Integrated. To make the time series stationary by measuring the differences of observations at different time (d).

· MA: Moving Average. An approach that takes into accounts the dependency between observations and the residual error terms when a moving average model is used to the lagged observations (q).

To get the values of these key elements we use following plots:-

ACF & PACF plots:

ACF plot (Auto-Correlation plot) - Auto-Correlation plots are used to find the p (AR) term of the ARIMA model. Or the auto-correlated lag to the current value. The sample autocorrelation function (ACF) for a series gives correlations between the series xt and lagged values of the series for lags of 1, 2, 3, and so on. Essentially ACF gives correlations between xt and xt-1, xt and xt-2, and so on. The ACF can be used to identify the possible structure of time series data. That can be tricky going as there often isn’t a single clear-cut interpretation of a sample autocorrelation function. The ACF of the residuals for a model is also useful. The ideal for a sample ACF of residuals is that there aren’t any significant correlations for any lag.

PACF plot (Partial Auto-Correlation plot) - In general, the "partial" correlation between two variables is the amount of correlation between them which is not explained by their mutual correlations with a specified set of other variables. For example, if we are regressing a variable Y on other variables X1, X2, and X3, the partial correlation between Y and X3 is the amount of correlation between Y and X3 that is not explained by their common correlations with X1 and X2. This partial correlation can be computed as the square root of the reduction in variance that is achieved by adding X3 to the regression of Y on X1 and X2. To interpret the acf & pacf charts is not always easy, below is a table that is a guideline to finding the both the AR & MA terms from these charts

The general form of a ARIMA model is denoted as ARIMA(p, d, q). With seasonal time series data, it is likely that short run non-seasonal components contribute to the model. Therefore, we need to estimate seasonal ARIMA model, which incorporates both non-seasonal and seasonal factors in a multiplicative model. The general form of a seasonal ARIMA model is denoted as ARIMA(p, d, q) × (P, D, Q)S, where p is the non-seasonal AR order, d is the non-seasonal differencing, q is the non-seasonal MA order, P is the seasonal AR order, D is the seasonal differencing, Q is the seasonal MA order, and S is the time span of repeating seasonal pattern, respectively. We talked about the p and q component which we get it through ACF and PACF and for d we use inverse autocorrelation function (IACF).

I hope this gives your basic understanding around ARIMA and how to get the component for building the model however as I mentioned earlier that ARIMA has some limitation and the major one is that in a simple ARIMA model, it is hard to model the nonlinear relationships between variables and other one is the assumption that there is a constant standard deviation in errors in ARIMA model, which in practice might not be there always. To address these limitations and other challenges in time series data now a days we are using LSTM which is special case of Recurrent Neural Network (RNN). Let us understand the basics of LSTM.

LSTM:-

Before I go in the details of LSTM, I am assuming that you have basic understanding of neural network and RNN as LSTM is a type of RNN that can hold and learn from long sequence of observations. This becomes very important in time series data as in time series datasets there is a sequence of dependence among the input variables and RNNs are very powerful in handling the dependency among the input variables. Each LSTM is a set of cells, or system modules, where the data streams are captured and stored. The cells resemble a transport line (the upper line in each cell) that connects out of one module to another one conveying data from past and gathering them for the present one. Due to the use of some gates in each cell, data in each cell can be disposed, filtered, or added for the next cells. Hence, the gates, which are based on sigmoid neural network layer, enable the cells to optionally let data pass through or disposed. Each sigmoid layer yields numbers in the range of zero and one, depicting the amount of every segment of data ought to be let through in each cell. More precisely, an estimation of zero value implies that “let nothing pass through”; whereas; an estimation of one indicates that “let everything pass through.” Three types of gates are involved in each LSTM with the goal of controlling the state of each cell:

Forget Gate: It Outputs a number between 0 and 1, where 1 shows “completely keep this”; whereas, 0 implies “completely ignore this.”

Memory Gate: It chooses which new data need to be stored in the cell. First, a sigmoid layer, called the “input door layer” chooses which values will be modified. Next, a tanh layer makes a vector of new candidate values that could be added to the state.

Output Gate: It decides what will be yield out of each cell. The yielded value will be based on the cell state along with the filtered and newly added data.

Now as we understood the basics of ARIMA and LSTM. Let us see how we implement this in the real world, I prefer R for implementing any traditional statistical algorithms and Python for deep learning techniques so will implement ARIMA in R and LSTM in Python.

Implementing ARIMA:-

They are applied in some cases where data show evidence of non-stationarity, where an initial differencing step (corresponding to the "integrated" part of the model) can be applied to reduce the non-stationarity.

Prerequisites

De-Trending

The De-trending is required to remove any kind of increasing or decreasing trend in the data over a period of time.

Finding – KPSS Test – This test is carried out to detect any underlying trend in the data. Null hypothesis being there is trend in the data. This test should be rejected.

kpss.test(x = dataset$sale, null = c("Trend"), lshort = TRUE)

Fixing:

- Fit a regression line - It can either be done by fitting a regression line to the data and subtracting it from the data, and keeping only the residuals

- Filter from the mFilter package – Of the many filters available one of the filters in hpfilter. It helps in separating the trend and cyclical component of the time series from the raw data.

summary.mFilter() – Summary statistics of the actual, trend and cyclic data

plot.mFilter() – Plots the trend on the actual data

fitted.mFilter() – Gives you the trend component

residuals.mFilter() – Gives you the real – trend component

Stationary series

A stationary process has the property that the mean, variance and autocorrelation structure do not change over time.

Finding :

- Using a run-sequence plot (S) > Response Variable vs Time

- Dickey-Fuller Test > This test is use to check for stationarity of a series. This test should reject the NULL hypothesis of non-stationarity.

adf.test() can be used to calculate the augmented statistic

Fixing :

- Differencing > This implies subtracting the subsequent observations from the current observation i.e. y(t) – y(t-1). This helps remove any non-stationarity in the series.

In case the first difference fails to remove stationarity, second difference can be tried and so on until the non-stationarity is removed.

NOTE : The ADF test and the KPSS test can give you some information to determine whether the trend is deterministic or stochastic.

As the null hypothesis of the KPSS test is the opposite of the null in the ADF test, the following way to proceed can be determined beforehand:

1. Apply the KPSS to test the null that the series is stationary or stationary around a trend. If the null is rejected (at a predetermined level of significance) conclude that the trend is stochastic, otherwise go to step

2. Apply the ADF test to test the null that a unit root exists. If the null hypothesis is rejected, then conclude that there is no unit root (stationarity), otherwise the result of the procedure is not informative since none of the tests rejected the corresponding null hypothesis. In that case it may be more cautions to consider the existence of a unit root and detrend the series by taking first differences.

3. If the trend is deterministic (e.g. a linear trend) you could run a regression of the data on the deterministic trend (e.g. a constant plus time index) to estimate the trend and remove it from the data. If the trend is stochastic you should detrend the series by taking first differences on it.

Other Tests that can be explored are:

Jarque bera test – jarque.bera.test()

Phillips–Ouliaris Cointegration -- Test po.test()

Phillips-Perron test

Bds.test()

Seasonality

Check the seasonality .

Finding:

a. Run sequence plot

b. Seasonal Subseries plot (S) – Response Variable vs Time ordered by season (i.e. Jan to Dec, Sunday to Saturday, Hour 01 to Hour 24.

c. Multiple Box Plots (S) - Response Variable vs Time as factor variable

d. Autocorrelation Plot (S) - are a commonly-used tool for checking randomness in a data set. This randomness is ascertained by computing autocorrelations for data values at varying time lags. If random, such autocorrelations should be near zero for any

and all time-lag separations. If non-random, then one or more of the autocorrelations will be significantly non-zero.

Fixing:

- Average out the series at the given time frequency and subtract it from the corresponding time series. Use this new series to run the ARIMA model

Model Build

Order of the ARIMA model (p,d,q)

A nonseasonal ARIMA model is classified as an "ARIMA (p,d,q)" model, where:

- p is the number of autoregressive terms ( the series is stationary and autocorrelated, perhaps it can be predicted as a multiple of its own previous value, plus a constant)

- d is the number of non-seasonal differences needed for stationarity

- q is the number of lagged forecast errors in the prediction equation (If the errors of a random walk model are autocorrelated, perhaps the problem can be fixed by adding one lag of the dependent variable to the prediction equation--i.e., by regressing the first difference of Y on itself lagged by one period)

Finding –

ACF & PACF plots:

par(mfrow=c(2,1))

acf(dataset$sale)

pacf(dataset$sale)

As explained above using these ACF and PACF plot get the order of the ARIMA model. However I would recommend to use auto.arima function: The Auto.Arima returns best ARIMA model according to either AIC, AICc or BIC value. The function conducts a search over possible model within the order constraints provided. This function is from the forecast package by r Rob Hyndman.

auto.arima(x, d = NA, D = NA, max.p = 5, max.q = 5, max.P = 2, max.Q = 2, max.order = 5, start.p = 2, start.q = 2, start.P = 1, start.Q = 1, stationary = FALSE, ic = c("aic", "aicc", "bic"), stepwise = TRUE, trace = FALSE, approximation = (length(x)>100 | frequency(x)>12), xreg = NULL, test = c("kpss", "adf", "pp"), seasonal.test = c("ocsb", "ch"), allowdrift = TRUE, lambda = NULL)

Implementing LSTM:-

To implement LSTM, will start with Keras library in python. As mentioned earlier that LSTM is a type of RNN that can hold and learn from long sequence of observations so as a function LSTM maps a sequence of past observations as input to an output observation. Because of this function we need to prepare the data in a way that that sequence of events has multiple sets from which the LSTM learn the pattern. I am not giving the code for this data preparation step but it should be straight forward to divide the data into multiple input/output. Example code for LSTM model:-

model = Sequential()

model.add(LSTM(neurons), stateful=True))

model.compile(loss=’mse’,optimizer=’adam’)

model.fit(X, y, epochs=i, verbose=0)

yhat = model.predict(x)

As mentioned this code includes the steps to fits the model & make a prediction but not including the data preparation step. In this post I am not going in details of implementing LSTM in different scenarios of time series data as LSTM can be used in multiple different way for different kind of datasets and maybe I will write separate post just for LSTM, however the basics of implementing the same will be same.

I have worked on multiple forecasting time series problems and many times I ended up with implementing traditional time series algorithms but whenever we could implement LSTM it outperformed the tradition algorithms and gave significant lift in the accuracy. Keep in mind that LSTM offers the benefit of superior performance over an ARIMA model at a cost of increased complexity so the decision on choosing algorithms mainly between ARIMA or LSTM for time series forecasting depends on a number of factors. I recommend to start with ARIMA as your benchmark model before moving to any other alogrithms including LSTM and you can always come back to work more on ARIMA if other algorithms are not adding the superior performance.

Finally, you might want to test all the approaches but that means you will spend significant time and cost and in many cases we have to decided before hand the choices between models and for that you need to make estimates around the cost, to make these estimates and selecting the best algorithms comes with experience so gain as much as experience you can gain on forecasting time series problems.

要查看或添加评论，请登录

Sanchit Tiwari的更多文章

Understanding the vanishing gradient problem(VGP) and solutions

2020年12月30日

Understanding the vanishing gradient problem(VGP) and solutions

In this article, I am trying to put together an understanding of the vanishing gradient problem(VGP) in a simplistic…

1 条评论
AIOps – Driving Digital Transformation in IT Operations

2020年5月25日

AIOps – Driving Digital Transformation in IT Operations

In recent years, artificial intelligence(AI) for IT operations termed as AIOps by Gartner in 2017 is in focus for…

4 条评论
Deep Learning - Different Frameworks

2019年12月21日

Deep Learning - Different Frameworks

Many research areas are getting impacted and transformed with the increase of new computing resources/ techniques and…
Feedback loop in Machine Learning – Labeling data

2019年9月24日

Feedback loop in Machine Learning – Labeling data

In real life application supervised machine learning depends on labeled datasets and quality of data labels have huge…

4 条评论
Inferential statistics in nutshell – With Python

2019年7月28日

Inferential statistics in nutshell – With Python

As a research scholar, I need to use inferential statistics in my research work to make inferences about the population…
Data Leakage in Machine Learning – avoiding the trap

2019年1月16日

Data Leakage in Machine Learning – avoiding the trap

Data leakage is one of the most frequent mistake happens during our machine learning model building and it can happen…

4 条评论
Math for ML - Using LaTeX & Python

2019年1月14日

Math for ML - Using LaTeX & Python

Before you start learning or implementing any machine learning algorithms, Mathematics is basic requirement…
Fleet Management with Machine Learning

2018年12月29日

Fleet Management with Machine Learning

The word Fleet in simple terms can be understood as “a group of vehicles”. Fleet management is a system designed for…

1 条评论
Deep Learning - Time to Deep Dive

2017年5月14日

Deep Learning - Time to Deep Dive

Last week attended the deep learning summit in Singapore with an objective to learn more about the application of deep…
Know the Value of your Customer

2017年3月5日

Know the Value of your Customer

In today’s world we are using Data Science to solve different problem for different types of business and helping them…

1 条评论

See all articles

Forecasting time series: choosing the algorithm to model

Sanchit Tiwari

Associate Partner at McKinsey & Company I Senior Principal at QuantumBlack, AI by McKinsey

Sanchit Tiwari的更多文章

社区洞察

其他会员也浏览了

Simulating the Physical World: Stochastic 'Model-driven' Digital Twins.

Kernel method in stock prices anomaly detection

What Is Polynomial Regression in Machine Learning?

Titanic Machine Learning from Disaster

Effective XGBoost by Matt Harrison

Support Vector Machines (SVM) in Plain English

XGBoost

Classification Techniques in Brain-Computer Interface: Decoding the Mind’s Intentions

Using ML techniques to determine stability constants from a multivariate data: a example of the pKa determination from the absorption spectra

???? Navigating the Gradient Descent Landscape: A Comprehensive Exploration of Machine Learning Optimization ????

Sanchit Tiwari的更多文章

Understanding the vanishing gradient problem(VGP) and solutions

AIOps – Driving Digital Transformation in IT Operations

Deep Learning - Different Frameworks

Feedback loop in Machine Learning – Labeling data

Inferential statistics in nutshell – With Python

Data Leakage in Machine Learning – avoiding the trap

Math for ML - Using LaTeX & Python

Fleet Management with Machine Learning

Deep Learning - Time to Deep Dive

Know the Value of your Customer

社区洞察

其他会员也浏览了

Simulating the Physical World: Stochastic 'Model-driven' Digital Twins.

Kernel method in stock prices anomaly detection

What Is Polynomial Regression in Machine Learning?

Titanic Machine Learning from Disaster

Effective XGBoost by Matt Harrison

Support Vector Machines (SVM) in Plain English

XGBoost

Classification Techniques in Brain-Computer Interface: Decoding the Mind’s Intentions

Using ML techniques to determine stability constants from a multivariate data: a example of the pKa determination from the absorption spectra

???? Navigating the Gradient Descent Landscape: A Comprehensive Exploration of Machine Learning Optimization ????