登录查看更多内容

Decoding Customer Behavior: My Journey with RFM Analysis and K-Means Clustering

Venugopal Adep

AI Leader | General Manager at Reliance Jio | LLM & GenAI Pioneer | AI Evangelist

发布日期: 2023年11月17日

On my adventure through RFM analysis and K-Means clustering, I uncovered fascinating insights into customer behaviors, segmenting them into meaningful groups based on how recently, how often, and how much they purchase. This journey not only helped me understand customer patterns better but also paved the way for targeted marketing strategies. Next up, I plan to dive deeper into these clusters, tailoring specific approaches to engage each group effectively, enhancing customer satisfaction and loyalty.

Link to my code:

Import libraries

from datetime import datetime
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

RFM (Recency, Frequency, Monetary) analysis is an excellent method for understanding customer value in a retail context. This approach segments customers based on:

To proceed with the RFM analysis and clustering, I'll first need to inspect the dataset to understand its structure and content. Let's start by loading the data and taking a look at the first few rows.

The dataset contains the following columns:

Before proceeding with the RFM analysis, I'll perform some data cleaning and preprocessing. This includes:

Checking and handling missing values, especially in crucial fields like CustomerID. Ensuring the correctness of data types, especially for dates and numeric fields. Creating a new column for the total amount spent per transaction (Quantity * UnitPrice). Let's start with these steps. Efficient Data Preparation:

Efficient Data Preparation: The First Step in Retail Data Analysis

In this code, I begin by loading a dataset of retail transactions and then embark on cleaning and restructuring the data. This involves converting 'InvoiceDate' to a usable datetime format, removing transactions with negative quantities, calculating the total price of each transaction, and ensuring that each record has a valid 'CustomerID'.

# Load the dataset
file_path = '/content/online_retail_data.csv'
retail_data = pd.read_csv(file_path)

# Convert 'InvoiceDate' to datetime
retail_data['InvoiceDate'] = pd.to_datetime(retail_data['InvoiceDate'], format='%d/%m/%y %H:%M')

retail_data = retail_data[retail_data['Quantity'] > 0]  # Remove negative quantities
retail_data['TotalPrice'] = retail_data['Quantity'] * retail_data['UnitPrice']
retail_data = retail_data.dropna(subset=['CustomerID'])
retail_data['CustomerID'] = retail_data['CustomerID'].astype(int)

Unveiling Customer Insights: The Heart of RFM Analysis

In this part of my code, I calculated the RFM (Recency, Frequency, Monetary) metrics by first setting a reference date (one day after the latest purchase in the dataset) and then grouped the data by customer. For each customer, I found out

how recently they made a purchase,
how many purchases they've made, and
how much they've spent in total.

This process is key to understanding customer behavior in detail.

# RFM Calculation
reference_date = retail_data['InvoiceDate'].max() + pd.Timedelta(days=1)
rfm_data = retail_data.groupby('CustomerID').agg({
    'InvoiceDate': lambda x: (reference_date - x.max()).days,
    'InvoiceNo': 'count',
    'TotalPrice': 'sum'
}).rename(columns={'InvoiceDate': 'Recency', 'InvoiceNo': 'Frequency', 'TotalPrice': 'Monetary'})

领英推荐

10 Best Customer Data Platforms to Supercharge Your…

Cyntexa 5 个月前

What Are the Common Challenges in Interpreting…

Televerde 6 个月前

Why Good Surveys Still Matter in a Big Data World

Fred Reichheld 8 年前

Finding the Perfect Balance: Normalization and Cluster Count in Data Analysis

In my code, I first normalized the Recency, Frequency, and Monetary values using the StandardScaler, making sure they're all on a comparable scale for K-Means clustering. Then, to find the ideal number of clusters, I used the Elbow Method, plotting the within-cluster sum of squares (WCSS) against different cluster counts, looking for the 'elbow' point where the WCSS starts to plateau

# Normalizing the RFM data for K-Means
scaler = StandardScaler()
rfm_normalized = scaler.fit_transform(rfm_data[['Recency', 'Frequency', 'Monetary']])

# Determining the optimal number of clusters using the Elbow Method
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10, random_state=0)
    kmeans.fit(rfm_normalized)
    wcss.append(kmeans.inertia_)

# Plotting the results to find the 'elbow'
plt.figure(figsize=(10, 6))
plt.plot(range(1, 11), wcss, marker='o', linestyle='--')
plt.title('Elbow Method to Determine Optimal Number of Clusters')
plt.xlabel('Number of Clusters')
plt.ylabel('WCSS')
plt.show()

Delving into Data: My K-Means Clustering Experience

In this part of my data journey, I applied K-Means Clustering to segment customers into four distinct groups based on their purchasing behavior. After configuring and running the K-Means algorithm on the normalized RFM data, I tagged each customer with their respective cluster, revealing intriguing patterns and groupings within the dataset.

# K-Means Clustering
kmeans = KMeans(n_clusters=4, init='k-means++', max_iter=300, n_init=10, random_state=0)
clusters = kmeans.fit_predict(rfm_normalized)
rfm_data['Cluster'] = clusters
rfm_data

Decoding Customer Clusters: My Analysis Breakdown

In this final part of my data exploration, I grouped the customers into clusters and calculated the average recency, frequency, and monetary values for each cluster. This step was like taking a closer look at each group, understanding their unique shopping patterns. Finally, I counted the number of customers in each cluster, giving me a complete picture of how these groups were distributed in my dataset.

# Analyzing the Clusters
cluster_analysis = rfm_data.groupby('Cluster').agg({
    'Recency': 'mean',
    'Frequency': 'mean',
    'Monetary': 'mean'
}).sort_values(by='Cluster', ascending=True)
cluster_analysis['Count'] = rfm_data.groupby('Cluster').size()
cluster_analysis

Cluster Analysis

Cluster 0:

Recency: On average, customers made their last purchase around 43 days ago.
Frequency: They made an average of 11 purchases.
Monetary: Average spending is around $209.
Count: This cluster contains 2699 customers.

Cluster 1:

Recency: Customers in this cluster made their last purchase around 246 days ago.
Frequency: They have a lower frequency with an average of 3.5 purchases.
Monetary: Average spending is about $69.
Count: There are 981 customers in this cluster.

Cluster 2:

Recency: These customers made their last purchase very recently, 2 days ago.
Frequency: Extremely high frequency with 546 purchases on average.
Monetary: Very high spending, averaging around $6774.
Count: A small cluster with only 4 customers.

Cluster 3:

Recency: Last purchase was made around 6 days ago.
Frequency: High frequency with an average of 104 purchases.
Monetary: The highest spending cluster, with an average of $10997.
Count: Contains 10 customers.

Insights

Cluster 0 represents regular customers with moderate frequency and spending.
Cluster 1 includes customers who haven't purchased recently, with low frequency and spending.
Cluster 2 is a unique small group with very recent purchases, extremely high frequency, and high spending.
Cluster 3 consists of premium customers with recent purchases, high frequency, and the highest spending.

This clustering provides a nuanced view of different customer behaviors, which can inform targeted marketing strategies and customer engagement initiatives.

要查看或添加评论，请登录

Venugopal Adep的更多文章

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

2025年2月15日

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

Executive Summary India's artificial intelligence landscape is undergoing a transformative shift with the emergence of…
?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

2024年11月19日

?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

In an ambitious move to democratize AI across India, Reliance Jio is establishing a network of AI research centers…
5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

2024年11月19日

5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

In a groundbreaking development, Reliance Jio has unveiled its comprehensive AI solutions ecosystem, powered by…
The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

2024年11月19日

The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

In an ambitious move to transform India's tech landscape, Jio has launched a comprehensive AI upskilling initiative…

1 条评论
?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

2024年11月19日

?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

In an ambitious move to democratize AI access across India, Reliance Jio is launching AI-powered smartphones starting…
2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

2024年11月19日

2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

In a bold move to revolutionize India's digital infrastructure, Reliance Jio is deploying an unprecedented network of…
?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

2024年11月19日

?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

In a bold move to revolutionize India's digital ecosystem, Reliance Jio has unveiled an ambitious AI strategy backed by…
The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

2024年11月18日

The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

In a groundbreaking development at the intersection of artificial intelligence and food science, AI-powered flavor…
Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

2024年11月18日

Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

In a groundbreaking convergence of neuroscience and artificial intelligence, researchers have developed sophisticated…
The Sleep Engineer: AI That Designs Your Perfect Dreams

2024年11月18日

The Sleep Engineer: AI That Designs Your Perfect Dreams

In a groundbreaking advancement at the intersection of neuroscience and artificial intelligence, researchers have…

See all articles

Decoding Customer Behavior: My Journey with RFM Analysis and K-Means Clustering

Venugopal Adep

AI Leader | General Manager at Reliance Jio | LLM & GenAI Pioneer | AI Evangelist

Import libraries

Efficient Data Preparation: The First Step in Retail Data Analysis

Unveiling Customer Insights: The Heart of RFM Analysis

领英推荐

Finding the Perfect Balance: Normalization and Cluster Count in Data Analysis

Delving into Data: My K-Means Clustering Experience

Decoding Customer Clusters: My Analysis Breakdown

Cluster Analysis

Cluster 0:

Cluster 1:

Cluster 2:

Cluster 3:

Insights

Venugopal Adep的更多文章

社区洞察

其他会员也浏览了

Why Good Surveys Still Matter in a Big Data World

Leveraging Data Across Channels: A Guide to Unified Customer Insights

What is a SAP Customer Data Platform?(CDP)? Introduction Guide

Using analytics to understand customer behavior and preferences

How IT Can Enhance Customer Experience Through Data Analytics

The Key Essentials for a Customer-Centric Organization: Collect and Store Customer Data

Introduction to Customer Analytics

Excited about customer analytics? Sort your single customer view out first

Customer Data Platforms - What, Why ?

About Experience profiles, data orchestration, customer identity resolution and personalisation.

Import libraries

Efficient Data Preparation: The First Step in Retail Data Analysis

Unveiling Customer Insights: The Heart of RFM Analysis

领英推荐

Finding the Perfect Balance: Normalization and Cluster Count in Data Analysis

Delving into Data: My K-Means Clustering Experience

Decoding Customer Clusters: My Analysis Breakdown

Cluster Analysis

Cluster 0:

Cluster 1:

Cluster 2:

Cluster 3:

Insights

Venugopal Adep的更多文章

Advancing Linguistic Diversity: India's Journey in Developing Large Language Models

?2,500 Crore Investment: Jio's AI Research Centers in 12 Indian Cities

5,000 AI Use Cases: Inside Jio's Industry-Specific Solutions Factory

The 100K AI Engineers: Jio's Massive Upskilling Program for Digital India

?12,000 Per Device: How Jio's AI-Powered Smartphones Will Reach 500M Indians

2 Million Edge Nodes: Jio's Ambitious Plan to Create India's Largest AI Network

?75,000 Crore AI Push: How Jio Plans to Transform India's Digital Landscape by 2025

The Taste Synthesizer: AI That Creates Any Food Flavor Instantly

Memory Deletion: The AI Service That Helps You Forget Traumatic Experiences

The Sleep Engineer: AI That Designs Your Perfect Dreams

社区洞察

其他会员也浏览了

Why Good Surveys Still Matter in a Big Data World

Leveraging Data Across Channels: A Guide to Unified Customer Insights

What is a SAP Customer Data Platform?(CDP)? Introduction Guide

Using analytics to understand customer behavior and preferences

How IT Can Enhance Customer Experience Through Data Analytics

The Key Essentials for a Customer-Centric Organization: Collect and Store Customer Data

Introduction to Customer Analytics

Excited about customer analytics? Sort your single customer view out first

Customer Data Platforms - What, Why ?

About Experience profiles, data orchestration, customer identity resolution and personalisation.