Unveiling Consumer Behavior: Analysis and Prediction with Data Science
Marco Aurelio Minozzo
Data Science | Data Engineer | Software Development | IA ML IoT | MSc in Digital Transformation |
Understanding consumer behavior is essential for the success of any business in today's market. With the vast amount of data available, data science provides powerful tools to analyze and predict consumer actions. In this article, we will explore a real case study from a large marketplace, demonstrating how data analysis and machine learning techniques can be applied to uncover behavior patterns, optimize marketing strategies, and increase customer loyalty. We will dive into the theoretical and practical concepts underlying this process, providing valuable insights for those looking to leverage their e-commerce operations.
Consumer Behavior
In consumer marketing, the consumer lifecycle is a term used to describe the progression of steps a customer goes through when considering, buying, using, and maintaining loyalty to a product or service.
Why It's Important
These metrics can be tracked over time (e.g., quarter over quarter, year over year) and compared to industry-wide benchmarks. Comparing Customer Lifecycle metrics can help solve competitive gaps in product or service offerings and especially predict which consumer will buy from the store.
Propensity to Purchase
We want to measure a consumer's propensity to buy. Why is this important? For several reasons. First of all, I can't lose this customer, then I can create points of contact to drive the sale, and lastly, I increase my chance of selling significantly by interacting with the most likely consumers to buy.
From Data Lake
The architecture of a Data Lake is a data storage solution designed to manage large volumes of data coming from multiple sources. The Data Lake framework is made up of several layers, including the ingestion layer, where structured and unstructured data is collected from both streaming and batch sources. The data is then processed at the processing layer, where it is validated, cleaned, and transformed. After processing, the data is stored in the storage layer, which is divided into landing, cleansing, and curation zones. In addition, the cataloging and search layer allows for organization and easy retrieval of data. Finally, the processed data is made available at the consumption layer, where it can be used for analysis and visualization, facilitating data-driven decision-making.
This architecture enables the efficient ingestion, storage, processing, and consumption of large volumes of data.
Possible Solutions with Data Science:
Implementation in Data Lake:
These solutions will enable a deeper understanding of consumer behavior and help improve retention and sales through actionable insights derived from data.
Our Case
Collection of Information
Within the shopping site or mobile app, we will collect some important metrics from users who are logged into a session.
Metrics collected:
1.? SESSION_ID: Unique identifier of the session.
2.??Click_Image: Indicator of whether the user clicked on an image (0 or 1).
3.??Read_Review: Indicator of whether the user has read a review (0 or 1).
4.? Category_View: Indicator whether the user viewed the category (0 or 1).
5.??Read_Details: Indicator whether the user has read the product details (0 or 1).
6.??Video_View: Indicator whether the user has watched a product video (0 or 1).
7.??Add_to_List: Indicator whether the user has added the product to the list (0 or 1).
8.? Compare_Prc: Indicator if the user has compared prices (0 or 1).
9.? View_Similar: Indicator of whether the user has viewed similar products (0 or 1).
10.?Save_for_Later: Indicator whether the user has saved the product for later (0 or 1).
11.? Personalized: Indicator of whether the user used personalized recommendations (0 or 1).
12.? BUY: Indicator of whether the user purchased the product (0 or 1).
These metrics help to understand user behavior while browsing and purchasing on an e-commerce platform.
In our case study, 1 million records were collected, i.e. 1 million sessions with user interaction. Below is a part of the csv table with the result of the collection:
Predictive Model
We are going to implement the first stage of this entire Digital Marketing process, which would be the construction of a predictive model.
领英推荐
# Imports
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
??These lines import the libraries needed for data analysis and modeling. The imported libraries are:
?# Function to load data
def load_data(file_path):
return pd.read_csv(file_path)
This function loads the data from a CSV file specified by the file path (file_path) and returns a pandas DataFrame.
# Function to inspect data
def inspect_data(data):
print(data.dtypes)
print(data.head())
print(data.describe())
print(data.corr()['BUY'])
?This function prints:
?# Function to prepare data
def prepare_data(data):
# Correct the column name from 'Read_Reviews' to 'Read_Review'
predictors = data[['Read_Review', 'Compare_Products', 'Add_to_List', 'Save_for_Later', 'Personalized', 'View_Similar']]
targets = data. BUY
return train_test_split(predictors, targets, test_size=0.3)
This function:
?# Function to train model
def train_model(X_train, y_train):
model = GaussianNB()
model.fit(X_train, y_train)
return model
This function:
?# Function to evaluate model
def evaluate_model(model, X_test, y_test):
predictions = model.predict(X_test)
print(confusion_matrix(y_test, predictions))
print(accuracy_score(y_test, predictions))
return model.predict_proba(X_test)
?This function:
?# Function to predict propensity
def predict_propensity(model, data):
data = np.array(data).reshape(1, -1)
return model.predict_proba(data)[:,1]
This function:
?# Execution Pipeline
file_path = "/…/market_app_correlated.csv"
prospect_data = load_data(file_path)
inspect_data(prosptect_data)
X_train, X_test, y_train, y_test = prepare_data(prospect_data)
model = train_model(X_train, y_train)
probabilities = evaluate_model(model, X_test, y_test)
?Estas linhas:
Now let's do some simulations:
Predict propensity for new browsing data
new_browsing_data = [0, 0, 0, 0, 0, 0]
print("New User: propensity:", predict_propensity(model, new_browsing_data))
Result:
?New User: propensity: [0.19087601]
?That is, simply by entering and logging in to the website or application, the chance of buying that user is close to 19%.
Predict propensity after adding to list
?add_to_list_data = [1, 1, 1, 0, 0, 0]
print("After Add_to_List: propensity:", predict_propensity(model, add_to_list_data))
Result:
After Add_to_List: propensity: [0.61234248]
When you add an item to your list, the chance of buying it goes up to 61%.
Predict propensity after multiple interactions
?full_interaction_data = [1, 1, 1, 1, 1, 1]
print("Full Interaction: propensity:", predict_propensity(model, full_interaction_data))
?Result:
Full Interaction: propensity: [0.80887743]
?For those users who have made all possible interactions, the chance of purchase rises to about 81%.
To wrap up this comprehensive exploration of consumer behavior analytics and its implementation within a large marketplace, it is evident that leveraging data science and machine learning can significantly enhance our understanding of consumer actions and preferences. By systematically collecting and analyzing user interaction data, we can predict purchasing behaviors with remarkable accuracy, enabling the creation of targeted marketing strategies that boost engagement and conversion rates. This case study not only demonstrates the practical application of these techniques but also underscores their potential to drive substantial business growth and competitive advantage in the dynamic landscape of e-commerce.