Forecasting wine demand through time series analysis
Clique aqui para ler esse artigo em Português.
Time series analysis aims to identify patterns and forecast trends to assist in decision-making. Additionally, this type of study can be applied in various fields to help understand the behavior of something over time.
In this article, I present the main results obtained in building the best model to predict the demand for wines in a company specialized in this product.
* Note
This is a summarized article that shows the main results.
To check the full study, including the codes and methodology used, click here.
1. About the Project
Being able to predict demand for a product or service is a strategic technique that can assist in decision-making, planning, inventory management, resource optimization, and even customer satisfaction. There are many applications where this type of tool can be applied, across companies in various sectors.
This type of problem is typically addressed through time series analysis, which involves data distributed at sequential and regular intervals over time. This means that data is collected at time intervals such as hourly, daily, monthly, etc., and each new data point depends on the previous ones. Without the temporal connection of the data, the problem could be solved using linear regression (Analytics Vidhya, 2018).
2. General Objective
To develop a machine learning-based algorithm for predicting the demand for wines in a company specializing in this product.
3. Dataset
The data used in this project was provided by Rafael Duarte and consists of two files: one containing historical sales data and the other containing information about the wines. It includes daily sales data from three stores, with 219 products in stock over a period of three years (from January 2018 to December 2020).
4. Exploratory Data Analysis
This is an essential step in data science projects, aiming to gain a better understanding of the data by identifying patterns, outliers, potential relationships between variables, and more.
Among the most important findings are:
5. Feature Engineering
To improve the performance of a machine learning model, attribute engineering was employed. This involved breaking down the temporal information into 8 new attributes. Additionally, the total revenue for each product per day was calculated, resulting in 2 new attributes: one for the value in Brazilian real and another for the value in US dollars.
With these new attributes, it was observed in a statistical analysis that the average spending is 12,138 dollars. However, there is a high standard deviation and the median is lower than the mean, at 5,051 dollars. This indicates the presence of values considered outliers. However, since this is common in this field, these data points were retained to maintain the representation of real data.
6. Business Analysis
After merging the two datasets into one and performing the necessary data cleaning and preprocessing, a more in-depth analysis of the data was conducted. The goal was to discover relevant insights that would allow the company to better understand its demand and use this information to improve its sales, offerings, inventory management, and ultimately increase its profits.
7. Demand Forecasting
7.1. Transformation of the Time Series into Stationary
The data was properly prepared, and the series (stationary or non-stationary) was checked using the Augmented Dickey-Fuller Test (ADF). With a p-value of 0.1533, it was inferred that it was a non-stationary series. Therefore, the following treatments were applied to transform it into a stationary series:
1. Log transformation to reduce the magnitude of the values in the series.
2. Subtraction of the 30-day moving average from the log-transformed series.
3. Finally, differencing.
In summary, the treatment applied to make the series stationary involved removing the trend and seasonality from the time series data. After these transformations, the new ADF test yielded a p-value of 0.00000000000000000012.
Below, you can see the series before and after the treatments.
It's worth noting that a stationary time series means that the dataset exhibits constant statistical characteristics over time. In other words, it has constant mean, variance, and covariance within the time interval. Working with stationary series is essential because most statistical methods assume the premise of dealing with a stationary series for their calculations.
After this, the dataset was split into training and validation data to allow for the proper evaluation of models later on. The chosen period was 120 days, and it's important to remember that the longer the forecast horizon, the higher the chances of the model making errors. Therefore, shorter forecasting periods should result in smaller errors.
7.2. Model Creation
Several models were developed to evaluate their performance:
It's important to highlight that PyCaret is an auto-machine learning library that, in this case, built 27 different models. The best-performing model selected was the Huber w/ Cond. Deseasonalize & Detrending.
8. Model Evaluation
The final step is performance evaluation, which involves assessing how well a model's predictions match the actual outcomes. To facilitate comparison, the techniques used to make the series stationary were reversed.
First, let's present the results of the following models: Naive Approach (in pink), 3-day Moving Average (in blue), and Holt (in purple). The gray area represents the training period for the models, and the light blue represents the actual data for the period the models predicted. In this way, you can see that the Holt algorithm came closest to reality.
Next, we have a graph with the models: ARIMA (in pink), Prophet (in purple), and PyCaret (in orange). Initially, these models are similar until Prophet starts to diverge around the point of 01-11-2020. For this reason, it appears that the Prophet model better fits the actual data.
In a second step, the models were evaluated using two metrics that measure model errors to obtain a statistically more accurate assessment of which model performed best. The Mean Absolute Error (MAE) is the absolute error value in the forecast compared to the actual series, calculated as the average of the absolute values of the error magnitudes. Therefore, the lower its value, the better the model. Additionally, the Mean Absolute Percentage Error (MAPE) shows how much the predictions differ from the actual value in percentage terms, representing the percentage equivalent of MAE.
The results obtained were:
Assessment Metrics
Naive Approach: 1631.18 0.0357
Moving Average 3:1428.87 0.0309
Holt: 1370.60 0.0290
ARIMA: 1089.11 0.0233
Prophet: 1346.82 0.0287
PyCaret: 1756.81 0.0383
Therefore, the best model generated was by the ARIMA algorithm, with an error rate of only 2.33%. Although Prophet was not far behind, with a 2.87% error rate. It's worth noting that the execution time of Prophet is much faster compared to ARIMA, which can be an important consideration depending on the use case.
It's also important to highlight the result obtained by Holt, with a 2.90% error rate. Considering that Holt is a simpler algorithm and achieved results very close to Prophet, it could be an excellent alternative for practical applications.
9. Conclusion
The central objective of this study was to develop an algorithm capable of predicting the demand for wines in a store specializing in this product.
After proper data preprocessing and attribute engineering, various analyses were conducted to understand the business. Subsequently, predictive demand algorithms were developed.
Since we were dealing with a non-stationary time series, according to the Augmented Dickey-Fuller (ADF) test, a transformation of the series into stationary was performed to achieve better results in some algorithms.
The forecasting had a parameter of 120 days, and the following methods were used: Naive approach, Moving Average, Holt’s Linear Trend Model, ARIMA, Prophet, and PyCaret. As a result, based on MAPE (Mean Absolute Percentage Error), the ARIMA model performed the best with an error rate of only 2.33%, followed by Prophet with 2.87%, and Holt with 2.90%.
When applying the model, it's essential to consider the specific objective it will serve, as there are algorithms among these options that run more efficiently than others.
Get to know more about this?study
This study is available on Google Colab and on GitHub. Just click on the images below to be redirected.
Data | AI
1 年Celia Ishikiriyama
Analista de Dados | Python | PostgreSQL | ETL | AWS
1 年This is really good, congratulations!