Air Passenger Time Series Analysis
Sharma Saravanan
Technologist | AI/ML | AWS | Data Scientist | Python | R | Jenkins | NLP | Image Processing | GenAI | LLM | Trained 10K+ in ML, AWS & Data Science | Edu-Blogger | Traveller | Let's connect!
In this edition of our data analysis newsletter, we delve into the intriguing realm of time series forecasting using the R programming language. Today, we'll explore a comprehensive code that analyzes and forecasts air passenger data. So, fasten your seatbelts as we take off into the world of predictive analytics!
Installation and Library Loading:
install.packages("forecast")
library(forecast)
The code begins by installing and loading the "forecast" package, an essential R-time series analysis toolkit.
data("AirPassengers")
print(start(AirPassengers))
print(end(AirPassengers))
print(sum(is.na(AirPassengers)))
print(summary(AirPassengers))
plot(AirPassengers)
The AirPassengers dataset is loaded and basic exploratory analyses are conducted. This includes printing the start and end dates of the dataset, checking for missing values, summarizing the dataset, and visualizing the time series plot.
tsdata <- ts(AirPassengers, frequency = 12)
ddata <- decompose(tsdata, "multiplicative")
plot(ddata)
The time series data is transformed into a time series object with a frequency of 12 (indicating monthly data). The decompose() function is then utilized to decompose the time series into its constituent components (trend, seasonality, and remainder) using a multiplicative model, followed by plotting the decomposed components.
plot(AirPassengers)
abline(reg=lm(AirPassengers~time(AirPassengers)))
A trend line is fitted to the original time series plot to visualize the overall trend in air passenger numbers over time.
mymodel <- auto.arima(AirPassengers)
An Auto-Regressive Integrated Moving Average (ARIMA) model is automatically fitted to the data using the auto.arima() function, which selects the optimal parameters based on AIC (Akaike Information Criterion) values.
myforecast <- forecast(mymodel, level=c(95), h=10*12)
plot(myforecast)
The ARIMA model is used to forecast future air passenger numbers. The forecast() function generates forecasts along with prediction intervals at a 95% confidence level. The forecasted values are then visualized.
print(Box.test(mymodel$resid, lag=5, type="Ljung-Box"))
Box.test(mymodel$resid, lag=10, type="Ljung-Box")
Box.test(mymodel$resid, lag=15, type="Ljung-Box")
The residuals of the ARIMA model are subjected to a Ljung-Box test to assess if they exhibit any significant autocorrelation at various lag intervals.