How to Build a Streamlit App for Favorita Grocery Sales Forecasting Using Regression Model
Stella Oiro
Apprentice SoftwareDeveloper || Technical Writer || Expert SEO Writer || Clinical Officer || Entrepreneur
Are you interested in predicting future grocery sales for a retail corporation? If you're interested, you can check out my GitHub for more projects related to data science and machine learning. In this article, we'll walk you through how to build a Streamlit app using a regression model that was trained on the Favorita Grocery Sales dataset.
Data Description
The Favorita Grocery Sales dataset consists of transactional records of a retail corporation in Ecuador over a period of five years. The data contains information about store locations, item descriptions, on-shelf dates, promotions, and unit sales. The goal of the competition is to predict the unit sales for a set of test items and stores.
Model Training
To train our regression model, we used a combination of feature engineering and XGBoost regression. We started by cleaning the data, removing duplicates and missing values, and then engineered new features such as day of the week, month, and year. We also used one-hot encoding to convert categorical variables into binary features.
After feature engineering, we split the data into training and validation sets, trained an XGBoost regression model on the training data, and tuned the hyperparameters using grid search. Finally, we evaluated the model on the validation set and calculated the root mean squared logarithmic error (RMSLE) to measure the performance of the model.
Streamlit App Development
To develop our Streamlit app, we started by importing the necessary libraries, loading the trained model and encoder, and defining the input and output interfaces for the app. We then defined the prediction function, which takes user inputs, preprocesses them using the encoder, and feeds them into the trained model to make a prediction.
领英推荐
# Import necessary libraries
import streamlit as st
import pickle
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from xgboost import XGBRegressor
# Load the trained model and encoder
model = pickle.load(open("model.pkl", "rb"))
encoder = pickle.load(open("encoder.pkl", "rb"))
# Define the input and output interfaces for the Streamlit app
st.title("Favorita Grocery Sales Forecasting")
store_item_id = st.text_input("Store Item ID", "0_0")
date = st.date_input("Date")
onpromotion = st.selectbox("On Promotion", ["True", "False"])
# Define the prediction function
@st.cache()
def predict_sales(store_item_id, date, onpromotion):
df = pd.DataFrame({"store_item_id": [store_item_id],
"date": [date],
"onpromotion": [onpromotion]})
df["store_id"], df["item_id"] = df["store_item_id"].str.split("_", 1).str
df["year"] = df["date"].dt.year
df["month"] = df["date"].dt.month
df["day"] = df["date"].dt.day
df["weekday"] = df["date"].dt.weekday
df["onpromotion"] = encoder.transform(df[["onpromotion"]])
df.drop(["store_item_id", "date"], axis=1, inplace=True)
prediction = model.predict(df)
return prediction[0]
# Call the prediction function and display the outpu
if st.button("Predict Sales"):
prediction = predict_sales(store_item_id, date, onpromotion)
st.write("Predicted Unit Sales: ", prediction)t
Results
Our Streamlit app allows you to input a store item ID, date, and promotion status, and receive a prediction for the unit sales for that item and store. The app preprocesses your inputs and feeds them into the trained regression