Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

Prediction Model using Autoregressive Integrated Moving Average (ARIMA)

An autoregressive integrated moving average (ARIMA) is a statistical analysis model that predict values based on historical time series data, statistical concept of serial correlation where past data influence future data. One example can be, predict or forecast earnings of specific company based on past periods.

Both autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) predict values based on past data series, but ARMA assumes that data are stationary while ARIMA can handle both by differentiate non-stationary series, which means that the mean and variance are constant over time.

Stationarity in model is constancy of data over time where means and variance does not significantly change along the sequence of data.

In left original data with trend and in right stationary data in first difference

Seasonality is when data show regular patterns that repeat over calendar year. For this case, in prediction is used Seasonal Autoregressive integrated Moving Average (SARIMA).

Time series with seasonality

ARIMA combine and makes use of autoregressive features (lagged) with those of moving averages to smooth time series data

Non-seasonal ARIMA models are generally denoted ARIMA and is said to be of order (p,d,q) where parameters p, d, and q are non-negative integers:

·?????? p: the number of lag observations in the model.

·?????? d: the number of times were differenced.

·?????? q: the size of the moving average.

Autoregression (AR) indicates that the variable of interest is regressed on its own lagged, or prior, values. AR(1), here current value is based on immediately preceding value. AR(2), current value is based on the previous two values and so on. Integrated (I) is times differentiated to become stationary. Moving average (MA): is unexpected external factors as such, is linear combination of residual error from a moving average applied to lagged observations.

ARIMA Models

AR Model

AR model is represented by below equation:

AR(p)

Example:

MA Model

MA model is represented by below equation

MA(q)

Example:

When combine differencing with autoregression and moving average model, ARIMA is obtained. ARIMA is Autoregressive Integrated Moving Average without seasonality. Below is the full model.

Running ARIMA model

In first step, determine the value of d to check if the data is stationary. If data is stationary d=0. If data is trendy, take difference and check the stationarity again and again until it is stationary. You can detect by visual observation of the data plot, auto correlation and the variogram.

After determining d, the researcher can utilize PACF (partial auto correlation) to get the AR of order p. The order of AR will be where PACF cuts off after some lags. To determine MA of order q, the researcher has to look at ACF (auto correlation) of the differenced data, where ACF cuts off after some lags.

Practical example in R

library(tseries)

# Getting data
set.seed(250)
timeseries=arima.sim(list(order = c(1,1,2), ma=c(0.26,0.38), ar=0.6), n = 100)

# Timeseries plot
plot(timeseries)

# Checking stationarity
adf.test(timeseries) # if p > 0.005, time series is not stationary

# Differentiate one time
timeseries2 = diff(timeseries)

# Timeseries plot
plot(timeseries2)

# Check again stationarity
adf.test(timeseries2) # if p < 0.005, time series is stationary

# Checking order of ARMA, I is 1
acf(timeseries)
pacf(timeseries)

# Spliting data into train and test
train_series=timeseries[1:90]
test_series=timeseries[91:100]

# Propose models
arimaModel_1=arima(train_series, order=c(0,1,2))
arimaModel_2=arima(train_series, order=c(1,1,0))
arimaModel_3=arima(train_series, order=c(1,1,2))

# Choose the best (with smaller AIC)
print(arimaModel_1)
print(arimaModel_2)
print(arimaModel_3) # smaller AIC

# Predicting values
forecast_3 = predict(arimaModel_3, 10)

# Printing predicted values
forecast_3        

要查看或添加评论,请登录

José Jaime Comé的更多文章

  • Machine Learning: Predicting outcomes using Binary Logistic Regression

    Machine Learning: Predicting outcomes using Binary Logistic Regression

    Logistic regression is a statistical model that is used for binary classification by linear combination of data of one…

  • Comparing means of different groups (Analysis of Variance)

    Comparing means of different groups (Analysis of Variance)

    Analysis of Variance (ANOVA) is collection of statistical tests used to analyze the difference between means of more…

    2 条评论
  • Linear Discriminant Analysis

    Linear Discriminant Analysis

    Linear discriminant analysis (LDA) group data into categories, as such, this technique is used for dimensionality…

    1 条评论
  • Factor Analysis

    Factor Analysis

    Factor analysis is a statistical method used to describe variability among large number of observed, correlated…

    1 条评论
  • Principal Component Analysis (PCA)

    Principal Component Analysis (PCA)

    The number of features or dimensions in a dataset can lead to issues such as overfitting, increasing computation…

    1 条评论
  • Data Governance

    Data Governance

    While Data management is part of the overall management of data. Data governance in short is just documentation…

  • Data Mining with Cluster Analysis

    Data Mining with Cluster Analysis

    The Cluster analysis is technique of statistical analysis and one of the method of data mining that consist of dividing…

社区洞察

其他会员也浏览了