登录查看更多内容

Time Series Analysis: The Case of Corporation Favorita

Victor Osei Duah

Data Analyst/Scientist | ML | AI |

发布日期: 2024年8月7日

+ 关注

“Forecasting Sales in Retail: A Machine Learning Approach for Corporation Favorita”

Introduction

Accurate sales forecasting is crucial in retail for optimizing inventory, staffing, and business strategy. This project develops a machine learning model to predict sales at Favorita stores by analyzing trends, customer behavior, and external factors like holidays and promotions.

Our insights will enable our client to optimize inventory, refine marketing strategies, and enhance profitability. We’ve created both statistical and machine learning models to ensure accuracy and flexibility.

Following the CRISP-DM framework, we leveraged data from marketing and sales teams to develop and validate predictive models.

For technical details, visit my GitHub repository. For a project summary and insights, explore my PowerBi dashboard

CRISP-DM FRAMEWORK

Objective of the Project

The primary goal was to design and deploy machine learning models that accurately predict sales across various Corporation Favorita locations, enabling the company to optimize inventory management and enhance profitability

Goal of the Project

The ultimate goal was to build models that more accurately predict the unit sales for thousands of items sold at different Favorita stores.

Note:

- Transferred holidays: Officially fall on one day but celebrated on another (e.g., Independencia de Guayaquil moved from Oct 9 to Oct 12). - Bridge days: Extra days added to holidays to extend breaks. - Work Days: Days off (e.g., Saturdays) used to compensate for Bridge days. - Additional holidays: Extra days added to regular holidays (e.g., Christmas Eve).

Important context:

- Public sector wages are paid every 2 weeks on the 15th and last day of the month, potentially impacting supermarket sales. - The 2016 Ecuador earthquake (magnitude 7.8) led to relief efforts and altered supermarket sales for several weeks.”

Data Sources

The datasets for this project were extracted from three sources:

First Dataset: Extracted from Microsoft SQL Server, this dataset includes three tables:

Oil Prices
Holiday Events
Stores

2. Second Dataset: Downloaded from OneDrive, this dataset includes two tables:

Sample Submission
Test

3. Third Dataset: Downloaded from a GitHub repository, this dataset includes two tables:

Train
Transactions

Hypothesis Testing: Impact of Promotional Activities on Sales

To understand the effectiveness of promotional activities, we will conduct hypothesis testing on the data. Our hypotheses are:

Null Hypothesis (H0): Promotional activities do not have a significant impact on sales.
Alternate Hypothesis (H1): Promotional activities have a significant impact on sales.

Statistical analysis following hypothesis testing revealed a significant correlation between promotion activities and Favorita store sales, prompting us to reject the null hypothesis. Further examination showed that promotions consistently generated positive outcomes, outperforming non-promotional activities in terms of sales. These results emphasize the crucial role of strategic promotions in boosting sales growth and profitability. The accompanying visual illustration reinforces this finding, depicting a pronounced upward trend in sales during promotional periods, underscoring their considerable impact.

Data Analysis and Model Building

The project will involve several steps, including data cleaning, feature engineering, model selection, and validation. The data analysis will focus on aligning sales data with promotional periods to evaluate their impact. We will then build and validate machine learning models to forecast product.

Analytical Questions

We’ve crafted targeted questions to explore the corporation’s performance in the face of various influences. By examining specific operational and market factors, these questions aim to uncover meaningful insights that shed light on the business landscape

1. Is the train dataset complete (has all the required dates)?

2. Which dates have the lowest and highest sales for each year (excluding days the store was closed)?

领英推荐

Back to Basics: What is On-Pack Product Data?

NIQ Brandbank 1 年前

Deep Diving into Retail Data Analytics: How Can…

JK Tech 2 年前

How Can We Improve Sales Forecast Accuracy Using Data…

SSDN Technologies 1 个月前

3. Compare the sales for each month across the years and determine which month of which year had the highest sales.

4. Did the earthquake impact sales?

5. Are certain stores or groups of stores selling more products? (Cluster, city, state, type)

6. Are sales affected by promotions, oil prices and holidays?

Oil prices and sales show a weak negative correlation (-0.075), indicating a slight decline in sales as oil prices rise. However, the relationship is not robust, and other factors may be involved.

Most sales are made on regular holidays and low sales are made on Bridged holidays

7. What analysis can we get from the date and its extractable features?

The Month of December and July have the most amount of sales.

Saturday and Sunday have the most amount of sales.

The 31st day of the month has the least amount of sales recorded.

8. Which product family and stores did the promotions affect.

Grocery I is the most affected product family. The effect is positive since most sales were made from the promoted items

9. Does the payment of wages in the public sector on the 15th and last days of the month influence the store sales.

Sales spike after month-end pay, peaking on day 2 and stabilizing by day 7, with a smaller mid-month bump.

Data Preprocessing and Feature Engineering

We handled missing values after merging and feature creation, added missing dates to ensure a complete timeline, renamed columns for clarity and consistency, and verified correct data types for each column.

Next, we performed feature engineering to extract relevant insights, standardized the data to ensure scalability, and encoded features for categorical variables.

Finally, we split the dataset into training and testing sets, preparing it for modeling.

Modeling and Evaluation

A range of machine learning models were utilized for sales prediction, including XGBoost, Gradient Boosting, Decision Tree, Linear Regression, SARIMA, and ARIMA. Model performance was comprehensively evaluated using four key metrics: root mean squared logarithmic error (RMSLE), root mean squared error (RMSE), mean squared error (MSE), and mean absolute error (MAE).”

Lower RMSLE values indicate better performance. With the lowest RMSLE, Decision Tree Regressor models outperform others. Given their already excellent performance, further hyperparameter tuning may not be necessary or yield significant improvements.

CONCLUSION

The Decision Tree Regressor together with the ARIMA & SARIMA models excel in sales prediction, handling diverse regional, store, and item variations. The best model choice depends on business needs and interpretability. This analysis provides valuable insights for retailers to optimize strategies, enhance decision-making, and boost profitability. As the retail landscape evolves, leveraging advanced analytics and machine learning is crucial for competitiveness.

要查看或添加评论，请登录

Victor Osei Duah的更多文章

Embedding Machine Learning Models in Graphic User Interface (GUI)

2024年9月1日

Embedding Machine Learning Models in Graphic User Interface (GUI)

CHURN PREDICTION APPLICATION In today’s data-driven world, machine learning models play a crucial role in extracting…
Churn Prediction and Analysis: Leveraging Machine Learning for Customer Retention

2024年7月7日

Churn Prediction and Analysis: Leveraging Machine Learning for Customer Retention

Introduction In today's competitive business landscape, retaining customers is crucial for sustained growth and…
INDIAN STARTUP ECOSYSTEM: Insights for Prospective Investors

2024年6月9日

INDIAN STARTUP ECOSYSTEM: Insights for Prospective Investors

INTRODUCTION India's startup ecosystem has experienced remarkable growth over the past decade, showcasing significant…

Time Series Analysis: The Case of Corporation Favorita

Victor Osei Duah

Data Analyst/Scientist | ML | AI |

Introduction

Objective of the Project

Goal of the Project

Data Sources

Hypothesis Testing: Impact of Promotional Activities on Sales

Data Analysis and Model Building

Analytical Questions

1. Is the train dataset complete (has all the required dates)?

2. Which dates have the lowest and highest sales for each year (excluding days the store was closed)?

领英推荐

3. Compare the sales for each month across the years and determine which month of which year had the highest sales.

4. Did the earthquake impact sales?

5. Are certain stores or groups of stores selling more products? (Cluster, city, state, type)

6. Are sales affected by promotions, oil prices and holidays?

7. What analysis can we get from the date and its extractable features?

8. Which product family and stores did the promotions affect.

9. Does the payment of wages in the public sector on the 15th and last days of the month influence the store sales.

Data Preprocessing and Feature Engineering

Modeling and Evaluation

CONCLUSION

Victor Osei Duah的更多文章

社区洞察

其他会员也浏览了

Top 5 Ways to Optimize Retail Category Management with Location Intelligence

How Data Analytics is Transforming the Retail Landscape

Retail Analytics Market Dynamics Explored, Consumer Behavior and Technologies Trends

Demand Sensing: Using Real-Time Data to Improve Forecasting

?? Real-Time Demand Sensing: Transforming Market Signals into Actionable Insights

A Simple Breakdown of Forecasting Models

Addressing Retail Challenges Through Data Analytics Solutions.

Retail Analytics Market

Forecasting Store Sales for Corporation Favorita in Ecuador

The Seven Essential Capabilities of Demand Sensing

Introduction

Objective of the Project

Goal of the Project

Data Sources

Hypothesis Testing: Impact of Promotional Activities on Sales

Data Analysis and Model Building

Analytical Questions

1. Is the train dataset complete (has all the required dates)?

2. Which dates have the lowest and highest sales for each year (excluding days the store was closed)?

领英推荐

3. Compare the sales for each month across the years and determine which month of which year had the highest sales.

4. Did the earthquake impact sales?

5. Are certain stores or groups of stores selling more products? (Cluster, city, state, type)

6. Are sales affected by promotions, oil prices and holidays?

7. What analysis can we get from the date and its extractable features?

8. Which product family and stores did the promotions affect.

9. Does the payment of wages in the public sector on the 15th and last days of the month influence the store sales.

Data Preprocessing and Feature Engineering

Modeling and Evaluation

CONCLUSION

Victor Osei Duah的更多文章

Embedding Machine Learning Models in Graphic User Interface (GUI)

Churn Prediction and Analysis: Leveraging Machine Learning for Customer Retention

INDIAN STARTUP ECOSYSTEM: Insights for Prospective Investors

社区洞察

其他会员也浏览了

Top 5 Ways to Optimize Retail Category Management with Location Intelligence

How Data Analytics is Transforming the Retail Landscape

Retail Analytics Market Dynamics Explored, Consumer Behavior and Technologies Trends

Demand Sensing: Using Real-Time Data to Improve Forecasting

?? Real-Time Demand Sensing: Transforming Market Signals into Actionable Insights

A Simple Breakdown of Forecasting Models

Addressing Retail Challenges Through Data Analytics Solutions.

Retail Analytics Market

Forecasting Store Sales for Corporation Favorita in Ecuador

The Seven Essential Capabilities of Demand Sensing