Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?ARIMA Time Series Models. Let’s get started:
Introduction to Time Series Models:
A time series is a set of observations or data points collected over time, often at regular intervals. Time series models are used for forecasting and prediction of future values of a variable of interest. They are particularly useful for analyzing and predicting trends and seasonal patterns. Time series models can also be used for identifying outliers or anomalies, which can provide valuable insights into the underlying processes that generated the data.
Use Cases:
Time-Series models have variety of business applications, such as:
- Sales Forecasting: A retailer can forecast sales for the upcoming holiday season based on historical sales data from the previous year.
- Inventory Management:?A manufacturer can forecast demand for raw materials based on historical usage patterns. This will help them manage inventory levels
- Demand Planning: A food and beverage company can forecast demand for a particular product during a particular season.
- Capacity Planning: A call center can forecast call volume for the upcoming month.
- Supply Chain Optimization: A logistics company can forecast demand for transportation services and optimize their fleet utilization.
- Financial Forecasting: A company can forecast revenue, profit, cash flow and stock prices based on historical data and market trends.
- Marketing?Analytics: An e-commerce company can forecast website traffic, social media engagement during a promotional period.
- Energy Demand Forecasting: A utility company can forecast electricity demand during peak usage periods.
- Predictive Maintenance: A manufacturing company can forecast equipment maintenance schedule.
- Risk Management: A hedge fund can forecast stock price movements and manage their portfolio accordingly.
- Human Resource Planning: A call center can forecast employee turnover, absenteeism and plan their staffing requirements accordingly.
- Weather Forecasting: A climate research group can forecast temperature and precipitation patterns for a specific region.
- Agriculture Forecasting: A farming company can forecast crop yields based on historical weather data and soil moisture levels.
- Healthcare Analytics: A hospital can forecast patient volume, hospital bed occupancy and plan their staffing requirements accordingly.
And many more over-time forecasts….
Type of Models:
In our series, we are going to cover all six main types of time series models:
- ARIMA: ARIMA models are used to model and forecast time series data by accounting for its autoregressive and moving average components, along with its trend and seasonality. Some popular use cases of ARIMA include forecasting stock prices, predicting demand for a product, and modeling climate change patterns.
- Exponential Smoothing (ES): ES models assume that recent observations are more relevant than older observations and give them higher weights. Some popular use cases of ES models include forecasting sales, predicting website traffic, and modeling electricity demand.
- SARIMA: SARIMA models account for the seasonal component of time series data by including seasonal autoregressive and moving average terms. Some popular use cases of SARIMA include forecasting seasonal sales, predicting demand for holiday gifts, and modeling seasonal weather patterns.
- Vector Autoregression (VAR): VAR models can capture complex relationships between multiple time series and are commonly used for macroeconomic forecasting, financial analysis, and social science research.
- Prophet: Prophet is designed to handle time series data with irregular gaps, missing values, and outliers. Some popular use cases of Prophet include predicting website traffic, forecasting sales for e-commerce sites, and modeling social media engagement.
- Hidden Markov Models: HMMs assume that the observed data is generated by a process with hidden states that change over time according to a Markov process. HMMs are commonly used for speech recognition, natural language processing, and biological sequence analysis.
Starting with ARIMA today-
The What:
ARIMA (Autoregressive Integrated Moving Average) is a statistical model used to analyze and forecast time series data. It is a combination of three components:
- Autoregression (AR) - This component represents the relationship between a variable and its past values. It is based on the idea that past values of a variable can help predict its future values.
- Moving Average (MA) - This component represents the relationship between a variable and its past forecast errors. It is based on the idea that errors made in the past can help predict future errors.
- Integrated (I) - This component represents the degree of differencing needed to make a time series stationary. Stationary time series are required because they have constant statistical properties over time, which makes them easier to model and forecast.
The Basics:
Time series data typically exhibits certain properties that need to be understood before fitting a model. These properties include:
??Trend: This refers to the long-term direction of the data, such as whether it is increasing, decreasing, or remaining constant over time. Commonly used trend tests are:
? Visual inspection: Plotting the data over time to visually examine whether there is a trend.
? Mann-Kendall test: A non-parametric test used to detect the presence of a trend in time series data. Link to a pdf explaining calculation is?here.
- Compare the calculated value of the test statistic (Z) to a critical value from a standard normal distribution at a given significance level.
- If Z is greater than critical value, then the null hypothesis of no trend is rejected, and you can conclude that there is evidence of a trend in the data. If Z is positive, this indicates an increasing trend, while if Z is negative, this indicates a decreasing trend.
- Conversely, if Z is less than the critical value, then the null hypothesis cannot be rejected, and you cannot conclude that there is evidence of a trend in the data.
? Sen's slope estimator: A non-parametric method used to estimate the slope of a linear trend in time series data.
- Where i and j are indices that range from 1 to n, for n observations. y_i is the value of feature for i’th index
- Slope can be interpreted with sign and magnitude.
??Seasonality: This refers to patterns that repeat over a fixed interval of time, such as daily, weekly, or monthly. Seasonality can be observed as regular cycles in the data.
- Seasonal sub-series plot: A graphical method to visualize seasonal patterns in time series data by dividing the data into seasonal segments.
- Seasonal decomposition: A statistical method used to separate a time series into seasonal, trend, and random components. It can be additive and multiplicative, check calculation steps?here.
- Fourier transform: A mathematical method that takes observation-over-time data and decompose it to amplitude-frequency components. Check calculation?here.
- Location of the peaks indicates the frequency of corresponding periodic signal. For example, if there is a peak at a frequency of 12, this suggests that there is a strong signal with a period of 12 months in the time series.
- Magnitude of the peaks indicates the strength of corresponding periodic signal. Larger magnitudes indicate stronger periodic signals.
- Multiple peaks suggests that there are multiple periodic signals in the time series with different frequencies. In this case, fit a seasonal ARIMA model with multiple seasonal components.
??Stationarity: This refers to the statistical properties of data remaining constant over time, such as the mean, variance, and autocorrelation. Stationary time series are easier to model and forecast using ARIMA models.
The How:
A step-by-step process of building ARIMA model:
- Visualize the Data: Plot the time series data to visualize its properties, including its trend, seasonality, and any other features that might affect its statistical properties. This will help you to determine if the data needs any transformations, such as differencing or log transformation or if a seasonal model is more appropriate instead of ARIMA.
- Stationarity Check: Stationarity is an important assumption of ARIMA models, and non-stationarity can affect the accuracy of the model. In case of non stationary data, it need to be differenced to degree d, as described in later sections.
- Model Fitting:?Determine the order of ARIMA model that best fits the data. Order of the model is determined by the values of the parameters p, d, and q, which represent the AR, I, and MA components of the model, respectively. Fit the ARIMA model to the data and estimate the parameters of model using maximum likelihood estimation.
- Model Selection: Different techniques for model selection are explained below.
- Forecasting: Use the fitted ARIMA model to make forecasts for future time periods along with their confidence intervals.
- Model Evaluation: Evaluate the accuracy of the forecasts by comparing them to the actual values for the forecasted time periods. You can use measures like mean absolute error (MAE), mean squared error (MSE), or root mean squared error (RMSE) to evaluate the model's performance.
General Equation of ARIMA models:
The equation for an ARIMA(p,d,q) model can be written as:
- Y_t is the time series observation at time t
- L is the lag operator, defined as L(Y_t) = Y_{t-1} and L^2(Y_t) = Y_{t-2} and so on..
- p is order of autoregressive (AR) component, where??_1,??_2,…,??_p are the coefficients of AR terms
- d is degree of differencing, which indicates the number of times the time series needs to be differenced to make it stationary
- q is the order of the moving average (MA) component, where?θ_1,?θ_2,…,?θ_q are the coefficients of MA terms
- ?_t is white noise with zero mean and constant variance
This equation shows how past values of the time series and past values of the error term contribute to the current value of the time series. The values of p, d, and q determine the number of past values that are included in the model and the degree of differencing required to make the time series stationary.
To generate a forecast for a future time point t+h, where h is the forecast horizon, you would use the observed values up to time t and plug them into the ARIMA model equation to obtain a predicted value for y_{t+h}.
Note: Forecast accuracy will depend on quality of parameter estimates, degree to which time series exhibits predictable patterns, and the length of the forecast horizon. It is recommended to evaluate accuracy of forecasts and to continually refine the model as more data becomes available.
Explanation of p, d, q - and Tradeoffs:
ARIMA model is characterized by 3 terms: p, d, q.
- p?represents the number of lagged values of variable that are included in the model. A higher value of p indicates that the model is more dependent on past values of dependent variable, and may capture more complex patterns in the data. However, a higher value of p also increases the risk of overfitting, which may cause the model to perform poorly on new data. Its typical value range from 1 to 5. Can also be 0 theoretically.
- d?represents the number of times data needs to be differenced to achieve stationarity. A higher value of d indicates that the data is more likely to be non-stationary and requires more differencing to achieve stationarity. However, a higher value of d also increases the risk of over-differencing, which may cause the model to lose important information and perform poorly on new data. Its typical value range from 0 to 3.
- q?component captures the influence of past errors on the current value of variable. A higher value of q indicates that the model is more dependent on past errors, and may be better at capturing short-term fluctuations in the data. However, a higher value of q also increases the risk of overfitting and may cause the model to perform poorly on new data. Its typical value range from 1 to 10. Can also be 0 theoretically.
Differencing:
Differencing is a method of transforming a non-stationary time series into a stationary one. First differencing is the process of subtracting the current value of the time series from its previous value, which results in a new series of differences between consecutive observations.
This can help remove any trend or seasonality present in the original time series and make it more stationary.
In case, first differencing is not sufficient, higher order differences can be taken, such as second differences or third differences. The second difference of a time series is calculated as the difference between the first differences, and the third difference is calculated as the difference between the second differences.
The d’th difference of a time series can be calculated recursively as:
Checking Stationarity of Data:
There are two main methods to establish stationarity in data.
??Augmented Dickey-Fuller (ADF) test:
- The null hypothesis is that time series is non-stationary, meaning it has a unit root. Alternative hypothesis is that time series is stationary, meaning it does not have a unit root.
- Choose number of lagged values (p) to include in regression model. This determines amount of autocorrelation in errors of model. Common choices for lag order is from 1 to 5.
- Estimate the regression model using ordinary least squares (OLS) or another method, with dependent variable being the first difference of time series (i.e., the difference between each value and the previous value) and independent variables being the lagged differences of time series up to order p. The model is of the form: Δyt = β0 + β1Δyt-1 + β2Δyt-2 + ... + βpΔyt-p + εt, where Δyt is the first difference of time series at time t, β0 is a constant term, β1 to βp are the coefficients of the lagged differences, and εt is the error term.
- Calculate test statistic based on estimated regression model. The most common test statistic is the t-statistic for the coefficient of lagged first difference, which is given by: t = (β1 - 1) / SE(β1), where β1 is the coefficient of lagged first difference, and SE(β1) is the standard error of the coefficient. The null hypothesis of the ADF test is that time series has a unit root, which is equivalent to β1 being equal to 1.
- Determine the critical value for test statistic based on sample size and chosen significance level. The critical value can be obtained from statistical tables or from software packages.
- If test statistic is greater than critical value at chosen significance level, then null hypothesis is rejected, and time series is considered stationary. If test statistic is less than critical value, then null hypothesis cannot be rejected, and time series is considered non-stationary, meaning additional differencing or detrending will be required.
False positives can occur if the test is run on a series with a trend, as the test may incorrectly identify the trend as non-stationarity. To avoid this, it is important to pre-process the data by removing any known trends or seasonal patterns.
Note: Phillips-Perron (PP) test builds on ADF test, where ADF statistic is modified to account for the serial correlation and heteroscedasticity in the data. Check?here?and?here.
??Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test:
- The null hypothesis is that time series is stationary. The alternative hypothesis is that time series is non-stationary.
- The first step in KPSS test is to estimate and remove trend in time series. This is done by regressing the time series on a set of lags, where the number of lags is chosen based on the data. Let k be the chosen number of lags.
- The detrended series is obtained by subtracting estimated trend from original time series: yt* = yt - α_1y(t-1) - α_2y(t-2) - ... - α_k*y(t-k), where yt* is the detrended series, yt is the original series, and α1, α2, ..., αk are the coefficients obtained from regression of yt on y(t-1), y(t-2), ..., y(t-k).
- Test statistic is calculated as the sum of squared deviations of detrended series from its mean, normalized by the sample variance:
where T is the sample size, σ^2 is the sample variance, and yt* is the detrended series.
- If test statistic is greater than upper critical value at chosen level of significance, null hypothesis of stationarity is rejected in favor of alternative hypothesis of non-stationarity. If the test statistic is less than lower critical value, null hypothesis is accepted and time series is considered stationary. If test statistic falls between critical values, the test is inconclusive.
Model Selection:
There are three main techniques for model selection, described below:
??Akaike Information Criterion (AIC):
AIC is derived from Kullback-Leibler (KL) divergence, which is a measure of information lost when using one probability distribution to approximate another. AIC is based on the idea that a good model should both fit the data well and be as simple as possible.
Following formula is used to calculate AIC:
- k is number of parameters in model
- L is the likelihood of data given the model
Models with more parameters will have a higher AIC than models with fewer parameters, even if they fit the data equally well.
Likelihood function (L) is a measure of how well the model fits the data. It is calculated using probability density function (PDF) of model and observed data. Higher the value of L, better the model fits the data.
AIC is used to compare different models fitted to same dataset. Model with lowest AIC is considered to be best.
Note: AIC is only valid for models fitted using maximum likelihood estimation.
??Bayesian Information Criterion (BIC):
BIC = k * log(n) - 2 * log(L)
- L is the likelihood of data given the model
- k is the number of parameters in model
- n is the number of observations in data
Model with the lowest BIC value is considered to be the best model, as it balances model fit with model complexity.
BIC tends to prefer simpler models than the AIC, which can be useful in situations where overfitting is a concern.
??Visual inspection of ACF and PACF plots:
Autocorrelation function (ACF) and partial autocorrelation function (PACF) are graphical tools used to detect autocorrelation in time series data. Autocorrelation refers to the correlation between observations at different time points within a time series. ACF and PACF plots are used to determine the order of autoregressive (AR) and moving average (MA) models.
ACF Plot?displays the correlation coefficients between the time series (y-axis) and its lagged values (x-axis). The lag is number of time periods that the time series is shifted. Correlation coefficient ranges from -1 to 1, where -1 indicates a perfect negative correlation, 0 indicates no correlation, and 1 indicates a perfect positive correlation.
- If ACF plot shows a strong correlation at first lag (lag 1) and then decreases slowly to zero, it suggests that an autoregressive (AR) model may be appropriate.
- If ACF plot shows a strong correlation at first lag followed by a quick drop to zero, it suggests that a moving average (MA) model may be appropriate.
- If ACF plot shows a slowly decaying sinusoidal pattern, it suggests that a seasonal model may be appropriate.
PACF plot?also shows the correlation between time series and its lagged values, but after removing the effect of intermediate lags.
- If PACF plot shows a significant correlation at first lag and then drops off quickly to zero, it suggests that an autoregressive (AR) model may be appropriate.
- If PACF plot shows a significant correlation at first lag followed by a slow decay to zero, it suggests that a moving average (MA) model may be appropriate.
- If the PACF plot shows a significant correlation at the seasonal lags and no significant correlation at the other lags, it suggests that a seasonal model may be appropriate.
The Why:
Reasons to use ARIMA models:
- Widely used and well-established in time series analysis, and they can be used to model a wide variety of time series data.
- Can capture both short-term and long-term trends if tuned properly
- Can handle missing data and irregularly spaced data, which can be difficult to model using other methods.
- Can be used to generate probabilistic forecasts, which provide a range of possible future outcomes and their associated probabilities.
- Simple and easy to interpret, which can be useful for communicating results to non-experts.
The Why Not:
Reasons to not use ARIMA models:
- Assumes that time series is stationary or can be made stationary through differencing. If the time series is highly non-stationary, then ARIMA models may not be appropriate.
- Outliers and other extreme values in data can have a significant impact on model's accuracy.
- Require a lot of data to estimate model parameters accurately.
- Not suitable for very short-term or very long-term forecasts, as they are designed to capture trends over a limited range of time.
- Not suitable for time series data with complex seasonal patterns.
Time for you to support:
- Reply to this email with your question
- Forward/Share to a friend who can benefit from this
- Chat on Substack with BxD (here)
- Engage with BxD on LinkedIN (here)
In next coming posts, we will cover five more time series models: Exponential Smoothing (ES), SARIMA, Vector Autoregression (VAR), Prophet, Hidden Markov Models
Let us know your feedback!