登录查看更多内容

Unveiling Consumer Behavior: Analysis and Prediction with Data Science

Marco Aurelio Minozzo

Data Science | Data Engineer | Software Development | IA ML IoT | MSc in Digital Transformation |

发布日期: 2024年6月20日

Understanding consumer behavior is essential for the success of any business in today's market. With the vast amount of data available, data science provides powerful tools to analyze and predict consumer actions. In this article, we will explore a real case study from a large marketplace, demonstrating how data analysis and machine learning techniques can be applied to uncover behavior patterns, optimize marketing strategies, and increase customer loyalty. We will dive into the theoretical and practical concepts underlying this process, providing valuable insights for those looking to leverage their e-commerce operations.

Consumer Behavior

In consumer marketing, the consumer lifecycle is a term used to describe the progression of steps a customer goes through when considering, buying, using, and maintaining loyalty to a product or service.

Why It's Important

These metrics can be tracked over time (e.g., quarter over quarter, year over year) and compared to industry-wide benchmarks. Comparing Customer Lifecycle metrics can help solve competitive gaps in product or service offerings and especially predict which consumer will buy from the store.

Propensity to Purchase

We want to measure a consumer's propensity to buy. Why is this important? For several reasons. First of all, I can't lose this customer, then I can create points of contact to drive the sale, and lastly, I increase my chance of selling significantly by interacting with the most likely consumers to buy.

Consumer Behavior: Definition: The consumer lifecycle describes the steps a customer goes through when considering, purchasing, using, and maintaining loyalty to a product or service. Importance: Measuring these metrics over time helps to identify gaps in product offerings and predict future purchasing behaviors.
Propensity to Buy: Measure: Important not to lose consumers, create points of contact and increase the chances of a sale. Example: Identification of consumers with a higher propensity to buy (e.g., 73%) versus a lower propensity (e.g., 11%).
Counterfactual Modeling: Description: Measures the impact of interventions on churn. Example: Churn lift ranging between 11% and 17%, with actual results at 16%.
Business Case - Target: Problem: Loss of millions of dollars due to potential buyers switching to competitors. Solution: Predict behavior and optimize interactions to retain customers.

From Data Lake

The architecture of a Data Lake is a data storage solution designed to manage large volumes of data coming from multiple sources. The Data Lake framework is made up of several layers, including the ingestion layer, where structured and unstructured data is collected from both streaming and batch sources. The data is then processed at the processing layer, where it is validated, cleaned, and transformed. After processing, the data is stored in the storage layer, which is divided into landing, cleansing, and curation zones. In addition, the cataloging and search layer allows for organization and easy retrieval of data. Finally, the processed data is made available at the consumption layer, where it can be used for analysis and visualization, facilitating data-driven decision-making.

Streaming and Batch: The source of the data can be from continuous streams (e.g. Twitter) or batch (e.g. spreadsheets).
Ingestion Layer: Responsible for collecting structured and unstructured data from different sources.
Storage Layer: Includes three zones: Landing Zone: Where the raw data is initially stored. Clean Zone: Where data is validated, cleaned, and standardized. Curated Zone: Data transformed, enriched, and modeled.
Processing Layer: Contains the data processing pipeline for validating, cleaning, standardizing, transforming, and enriching the data.
Cataloging & Search Layer: Allows cataloging and searching of stored data.
Consumption Layer: Where data is consumed through dashboards, analytics, and reports.

This architecture enables the efficient ingestion, storage, processing, and consumption of large volumes of data.

Possible Solutions with Data Science:

Preliminary Models: Use machine learning to predict churn and propensity to buy. Ferramentas: Python (Scikit-learn, TensorFlow), R (caret).
Real-Time Data Analysis: Implement real-time analytics systems to monitor consumer behavior. Ferramentas: Apache Kafka, Spark Streaming.
Customer Segmentation: Develop targeted marketing strategies based on customer segmentation. Tools: K-means clustering, cluster analysis.
KPI Tracking: Monitor consumer behavior KPIs and adjust strategies as needed. Ferramentas: Power BI, Tableau.
Automation and Customization: Create automated, personalized campaigns to engage customers with a high propensity to buy. Ferramentas: Marketing Automation Platforms (HubSpot, Marketo).

Implementation in Data Lake:

Ingestion Layer: Collecting data from multiple sources (e.g., app interactions, social media).
Processing Layer: Processing pipeline to validate, clean, transform, and enrich data.
Storage Layer: Data stored in different zones (landing, clean, curated) for easy access and analysis.
Consumption Layer: Data consumed by BI tools and dashboards for analysis and decision-making.

These solutions will enable a deeper understanding of consumer behavior and help improve retention and sales through actionable insights derived from data.

Our Case

Collection of Information

Within the shopping site or mobile app, we will collect some important metrics from users who are logged into a session.

Metrics collected:

1.? SESSION_ID: Unique identifier of the session.

2.??Click_Image: Indicator of whether the user clicked on an image (0 or 1).

3.??Read_Review: Indicator of whether the user has read a review (0 or 1).

4.? Category_View: Indicator whether the user viewed the category (0 or 1).

5.??Read_Details: Indicator whether the user has read the product details (0 or 1).

6.??Video_View: Indicator whether the user has watched a product video (0 or 1).

7.??Add_to_List: Indicator whether the user has added the product to the list (0 or 1).

8.? Compare_Prc: Indicator if the user has compared prices (0 or 1).

9.? View_Similar: Indicator of whether the user has viewed similar products (0 or 1).

10.?Save_for_Later: Indicator whether the user has saved the product for later (0 or 1).

11.? Personalized: Indicator of whether the user used personalized recommendations (0 or 1).

12.? BUY: Indicator of whether the user purchased the product (0 or 1).

These metrics help to understand user behavior while browsing and purchasing on an e-commerce platform.

In our case study, 1 million records were collected, i.e. 1 million sessions with user interaction. Below is a part of the csv table with the result of the collection:

Predictive Model

We are going to implement the first stage of this entire Digital Marketing process, which would be the construction of a predictive model.

领英推荐

Democratize Big data Analytics

Prof. dr. Koen Pauwels 3 个月前

These are the 5 types of data that analytics analyzes

Naveen Joshi 6 年前

The Power of Data: Leveraging Analytics for Business…

Tek Tree LLC 1 年前

# Imports
from pandas import Series, DataFrame
import pandas as pd
import numpy as np
import os
import matplotlib.pylab as plt
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

??These lines import the libraries needed for data analysis and modeling. The imported libraries are:

Pandas: for data manipulation and analysis.
numpy: for numerical operations.
OS: For operations related to the operating system.
matplotlib.pylab: for data visualization.
scikit-learn: for machine learning modeling and evaluation.

?# Function to load data

def load_data(file_path):
    return pd.read_csv(file_path)

This function loads the data from a CSV file specified by the file path (file_path) and returns a pandas DataFrame.

# Function to inspect data

def inspect_data(data):
     print(data.dtypes)
     print(data.head())
     print(data.describe())
     print(data.corr()['BUY'])

?This function prints:

The data types for each column (data.dtypes).
The first five rows of the DataFrame (data.head()).
A statistical summary of the data (data.describe()).
The correlation of each column with the 'BUY' column (data.corr()['BUY']).

?# Function to prepare data

def prepare_data(data):
# Correct the column name from 'Read_Reviews' to 'Read_Review'
   predictors = data[['Read_Review', 'Compare_Products', 'Add_to_List', 'Save_for_Later',     'Personalized', 'View_Similar']]
   targets = data. BUY
   return train_test_split(predictors, targets, test_size=0.3)

This function:

Seleciona as colunas preditoras de interesse (Read_Review, Compare_Products, Add_to_List, Save_for_Later, Personalized, View_Similar).
Define coluna alvo as BUY.
Splits the data into training and test sets using train_test_split, where 30% of the data is used for testing (test_size=0.3).

?# Function to train model

def train_model(X_train, y_train):
    model = GaussianNB()
    model.fit(X_train, y_train)
    return model

This function:

Create a Gaussian Na?ve Bayes (GaussianNB) model.
Trains the model with the training data (X_train and y_train).
Returns the trained model.

?# Function to evaluate model

def evaluate_model(model, X_test, y_test):
    predictions = model.predict(X_test)
    print(confusion_matrix(y_test, predictions))
    print(accuracy_score(y_test, predictions))
    return model.predict_proba(X_test)

?This function:

Uses the model to make predictions on the test set (X_test).
Imprime a matriz de confus?o (confusion_matrix) e a precis?o do modelo (accuracy_score).
Returns the predictive probabilities for the test set (model.predict_proba(X_test)).

?# Function to predict propensity

def predict_propensity(model, data):
    data = np.array(data).reshape(1, -1)
    return model.predict_proba(data)[:,1]

This function:

Converts the input data to a numpy array and adjusts the shape to be used in the model.
Returns the probability of propensity predicted by the model.

?# Execution Pipeline

file_path = "/…/market_app_correlated.csv"
prospect_data = load_data(file_path)
inspect_data(prosptect_data)
X_train, X_test, y_train, y_test = prepare_data(prospect_data)
model = train_model(X_train, y_train)
probabilities = evaluate_model(model, X_test, y_test)

?Estas linhas:

Set the path of the CSV file.
Load the data from the CSV.
They inspect the uploaded data.
Prepare the data for training and testing.
They train the model with the training data.
Evaluate the model with the test data.

Now let's do some simulations:

Predict propensity for new browsing data

new_browsing_data = [0, 0, 0, 0, 0, 0]
print("New User: propensity:", predict_propensity(model, new_browsing_data))

Result:

?New User: propensity: [0.19087601]

?That is, simply by entering and logging in to the website or application, the chance of buying that user is close to 19%.

Predict propensity after adding to list

?add_to_list_data = [1, 1, 1, 0, 0, 0]
print("After Add_to_List: propensity:", predict_propensity(model, add_to_list_data))

Result:

After Add_to_List: propensity: [0.61234248]

When you add an item to your list, the chance of buying it goes up to 61%.

Predict propensity after multiple interactions

?full_interaction_data = [1, 1, 1, 1, 1, 1]
print("Full Interaction: propensity:", predict_propensity(model, full_interaction_data))

?Result:

Full Interaction: propensity: [0.80887743]

?For those users who have made all possible interactions, the chance of purchase rises to about 81%.

To wrap up this comprehensive exploration of consumer behavior analytics and its implementation within a large marketplace, it is evident that leveraging data science and machine learning can significantly enhance our understanding of consumer actions and preferences. By systematically collecting and analyzing user interaction data, we can predict purchasing behaviors with remarkable accuracy, enabling the creation of targeted marketing strategies that boost engagement and conversion rates. This case study not only demonstrates the practical application of these techniques but also underscores their potential to drive substantial business growth and competitive advantage in the dynamic landscape of e-commerce.

要查看或添加评论，请登录

Marco Aurelio Minozzo的更多文章

NLP systems don't know how to say no

2024年11月16日

NLP systems don't know how to say no

Pré-texto: N?o entendeu nada do que eu escrevi, basta ouvir: https://www.youtube.

1 条评论
Data Architect in Digital Transformation

2024年10月18日

Data Architect in Digital Transformation

The role of the Data Architect has become even more critical as the world enters the digital transformation era. With…

1 条评论
AI and Machine Learning Integration in SQL

2024年10月9日

AI and Machine Learning Integration in SQL

The integration of AI and Machine Learning with SQL has revolutionized how businesses process data. By bringing…
Increase Efficiency and Reduce Costs with AI Strategies in Learning and Development

2024年7月23日

Increase Efficiency and Reduce Costs with AI Strategies in Learning and Development

In today's Learning and Development (L&D) landscape, Artificial Intelligence (AI) has proven to be a transformative…
AI Content Creation in the Context of L&D (Learning and Development)

2024年7月19日

AI Content Creation in the Context of L&D (Learning and Development)

In the field of Learning and Development (L&D), the application of Artificial Intelligence (AI) is revolutionizing the…
Marco Regulatório da Inteligência Artificial no Brasil

2024年7月18日

Marco Regulatório da Inteligência Artificial no Brasil

O Projeto de Lei 2338/2023, em tramita??o no Senado Federal do Brasil, busca estabelecer um marco regulatório para o…
Abandon All Hope, Ye Who Enter Here

2024年7月15日

Abandon All Hope, Ye Who Enter Here

In Dante Alighieri's "The Divine Comedy," the phrase "Abandon all hope, ye who enter here," inscribed above the gate of…
Abbandonate ogni speranza, voi che entrate

2024年7月15日

Abbandonate ogni speranza, voi che entrate

Nell'opera "La Divina Commedia" di Dante Alighieri, la frase "Abbandonate ogni speranza, voi che entrate", iscritta…
Abandonai toda esperan?a, vós que entráis

2024年7月15日

Abandonai toda esperan?a, vós que entráis

Na obra "A Divina Comédia" de Dante Alighieri, a frase "Abandonai toda esperan?a, vós que entrais", inscrita sobre o…

1 条评论
Cognitive Reserve: The Role of Virtual Learning Environments and AI

2024年7月10日

Cognitive Reserve: The Role of Virtual Learning Environments and AI

Cognitive reserve is a concept that describes the brain's ability to withstand the effects of aging and…

See all articles

Unveiling Consumer Behavior: Analysis and Prediction with Data Science

Marco Aurelio Minozzo

Data Science | Data Engineer | Software Development | IA ML IoT | MSc in Digital Transformation |

Consumer Behavior

Why It's Important

Propensity to Purchase

From Data Lake

Possible Solutions with Data Science:

Implementation in Data Lake:

Our Case

Collection of Information

Predictive Model

领英推荐

Now let's do some simulations:

Marco Aurelio Minozzo的更多文章

社区洞察

其他会员也浏览了

How Data Analytics Fuels Business Growth and Innovation

A CDO’s Guide to Unlocking Customer-Centric Innovation with Data Products

The Importance of Data Analysis in Today’s Business World

"Unveiling Insights: The Art and Science of Data Analysis"

Unveiling the Depths of Analytics and Industry Success Stories

Market Research in the Age of Big Data: Leveraging Analytics for Insights

The Power of Big Data: How It's Revolutionizing Industries and Changing the Game

The Impact of Big Data on Decision Making: Transforming Insights into Action.

The Basics of Data Science: Techniques for Subscription Services

Exploring Data Products: Driving Business Value through Innovation

Consumer Behavior

Why It's Important

Propensity to Purchase

From Data Lake

Possible Solutions with Data Science:

Implementation in Data Lake:

Our Case

Collection of Information

Predictive Model

领英推荐

Now let's do some simulations:

Marco Aurelio Minozzo的更多文章

NLP systems don't know how to say no

Data Architect in Digital Transformation

AI and Machine Learning Integration in SQL

Increase Efficiency and Reduce Costs with AI Strategies in Learning and Development

AI Content Creation in the Context of L&D (Learning and Development)

Marco Regulatório da Inteligência Artificial no Brasil

Abandon All Hope, Ye Who Enter Here

Abbandonate ogni speranza, voi che entrate

Abandonai toda esperan?a, vós que entráis

Cognitive Reserve: The Role of Virtual Learning Environments and AI

社区洞察

其他会员也浏览了

How Data Analytics Fuels Business Growth and Innovation

A CDO’s Guide to Unlocking Customer-Centric Innovation with Data Products

The Importance of Data Analysis in Today’s Business World

"Unveiling Insights: The Art and Science of Data Analysis"

Unveiling the Depths of Analytics and Industry Success Stories

Market Research in the Age of Big Data: Leveraging Analytics for Insights

The Power of Big Data: How It's Revolutionizing Industries and Changing the Game

The Impact of Big Data on Decision Making: Transforming Insights into Action.

The Basics of Data Science: Techniques for Subscription Services

Exploring Data Products: Driving Business Value through Innovation