Prediction Model using Autoregressive Integrated Moving Average (ARIMA)
José Jaime Comé
Information Management Associate @ UNHCR ? Data Specialist/Statistician (Python||R||SQL||PowerBI||Excel) ? Youtube: 15K+ subscribers
An autoregressive integrated moving average (ARIMA) is a statistical analysis model that predict values based on historical time series data, statistical concept of serial correlation where past data influence future data. One example can be, predict or forecast earnings of specific company based on past periods.
Both autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) predict values based on past data series, but ARMA assumes that data are stationary while ARIMA can handle both by differentiate non-stationary series, which means that the mean and variance are constant over time.
Stationarity in model is constancy of data over time where means and variance does not significantly change along the sequence of data.
Seasonality is when data show regular patterns that repeat over calendar year. For this case, in prediction is used Seasonal Autoregressive integrated Moving Average (SARIMA).
ARIMA combine and makes use of autoregressive features (lagged) with those of moving averages to smooth time series data
Non-seasonal ARIMA models are generally denoted ARIMA and is said to be of order (p,d,q) where parameters p, d, and q are non-negative integers:
·?????? p: the number of lag observations in the model.
·?????? d: the number of times were differenced.
·?????? q: the size of the moving average.
Autoregression (AR) indicates that the variable of interest is regressed on its own lagged, or prior, values. AR(1), here current value is based on immediately preceding value. AR(2), current value is based on the previous two values and so on. Integrated (I) is times differentiated to become stationary. Moving average (MA): is unexpected external factors as such, is linear combination of residual error from a moving average applied to lagged observations.
AR Model
AR model is represented by below equation:
领英推荐
Example:
MA Model
MA model is represented by below equation
Example:
When combine differencing with autoregression and moving average model, ARIMA is obtained. ARIMA is Autoregressive Integrated Moving Average without seasonality. Below is the full model.
Running ARIMA model
In first step, determine the value of d to check if the data is stationary. If data is stationary d=0. If data is trendy, take difference and check the stationarity again and again until it is stationary. You can detect by visual observation of the data plot, auto correlation and the variogram.
After determining d, the researcher can utilize PACF (partial auto correlation) to get the AR of order p. The order of AR will be where PACF cuts off after some lags. To determine MA of order q, the researcher has to look at ACF (auto correlation) of the differenced data, where ACF cuts off after some lags.
Practical example in R
library(tseries)
# Getting data
set.seed(250)
timeseries=arima.sim(list(order = c(1,1,2), ma=c(0.26,0.38), ar=0.6), n = 100)
# Timeseries plot
plot(timeseries)
# Checking stationarity
adf.test(timeseries) # if p > 0.005, time series is not stationary
# Differentiate one time
timeseries2 = diff(timeseries)
# Timeseries plot
plot(timeseries2)
# Check again stationarity
adf.test(timeseries2) # if p < 0.005, time series is stationary
# Checking order of ARMA, I is 1
acf(timeseries)
pacf(timeseries)
# Spliting data into train and test
train_series=timeseries[1:90]
test_series=timeseries[91:100]
# Propose models
arimaModel_1=arima(train_series, order=c(0,1,2))
arimaModel_2=arima(train_series, order=c(1,1,0))
arimaModel_3=arima(train_series, order=c(1,1,2))
# Choose the best (with smaller AIC)
print(arimaModel_1)
print(arimaModel_2)
print(arimaModel_3) # smaller AIC
# Predicting values
forecast_3 = predict(arimaModel_3, 10)
# Printing predicted values
forecast_3