登录查看更多内容

Customer Segmentation using RFM Analysis in Python

Evangelos (Angelo) T.

Director, Lead Data Steward, Capital Markets

发布日期: 2020年3月28日

+ 关注

Capstone Project, General Assembly, London, Feb 2020

Dataset: https://archive.ics.uci.edu/ml/datasets/online+retail

Introduction

In Retail and E-Commerce (B2C), and more broadly in B2B, one of the key elements shaping the business strategy of a firm would be understanding of customer behaviour. More specifically, understanding their customers based on different business metrics: how much they spend (revenue), how often they spend (frequency), are they new or existing customer, what are their favorite products, etc... Such understanding would in turn help direct marketing, sales, account management and product teams to support better this customer and improve the product offering.

The objective of this project has been to:

Identify customer segments based on the overall buying behaviour of the client

Stakeholders

Typically, the stakeholder group within the firm that would benefit from such analysis and insights would be:

Product/Services: Products selling more than others, would be an opportunity to evaluate the product offering or improve specific product features.
Operations/Logistics: From stock management perspective, understanding which products are in demand would reduce storage costs and improve delivery/logistics operations.
Marketing: Understanding of the customer segments, would allow for more effective and targeted marketing to specific customer groups, by creating a base campaign with core content for the broader client base but specific variations depending on the segment.
Sales/Account Management: Identifying which customers are the most valuable and understanding their trends would go a long way towards building a genuine relationship, thus retaining existing customers and attracting new with the ideal customer profile.

Dataset

Online Retail Data Set https://archive.ics.uci.edu/ml/datasets/online+retail

The dataset used is a fairly popular data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail e-Commerce business. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

Outcome

The outcome of this project would be:

Create an unsupervised model that generates the optimum number of segments for the customer base.

Success Criteria

In order to make sure we understand the impact and success of the project, we've set some measurable criteria with regards to our analysis:

Segments generated can be interpreted and transposed into business actions.

Exploratory Data Analysis - EDA

Load Data

Lets start by importing key libraries and the dataset that has been downloaded from the link above and stored locally in an offline directory.

# Import key Libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from datetime import datetime

# Load data from Local Repository
retail = pd.read_excel('../offline-datasets/Online Retail.xlsx')

Data Overview

Once we've loaded up the dataset onto "retail" dataframe we can start reviewing the data:

Following a quick inspection and scan of our data set, there are a number of issues that have been identified:

InvoiceNo

Most Invoices appear as normal transactions with positive quantity and prices, but there are some prefixed with "C" or "A" which denote different transaction types.
InvoiceNo needs to be broken down to Number and Code (New Feature).
Codes will need to be renamed where appropriate (Data Processing).

CustomerID

Empty or Null values in around ~135k records.
As such, we need to identify the active population and drop the empty records (Data Processing).

Description

Empty or Null values in around ~1.5k records.
Again, we need to identify the active population and drop the empty records (Data Processing).

Quantity

Negative values in around ~10k records.
Here, we need also identify the active population and drop records based on specific rules (Data Processing).

UnitPrice

Negative values present ~2 records.
As per above, identify the active population and drop the empty records based on specific rules (Data Processing).

Dates

Need to extract dates from InvoiceDate as a new feature and also month and year as separate new features too (New Feature).

Feature Engineering

Before proceeding with some more EDA and processing, lets start summarise some of the new features identified in the Initial Data Review above, which can be quite straightforward off the back of the existing dataset:

1) Break down InvoiceNo to

InvoiceNumber - new feature
InvoiceCode - new feature

C – Cancellation

A – Adjustment

Null – Rename to “N” – Normal

2) Break down InvoiceDate to

InvoiceYearMonth - new feature
InvoiceYear - new feature
InvoiceMonth - new feature
Date - Use as Index

The rest of the Data Analysis and pre-processing will involve:

Defining Active population based on feature rules
Creating Masks for filtering out bad data (empty or Null Values, negative prices, etc...) that don't contribute towards the active population
Visualising some of our data to get a better idea of the population

Active Population

Now that new features are in place and based on the initial data review summary, we need to derive an active population for our modelling stage:

A tree based rule set has been defined using the features from the dataset.
Each feature has one or more rules that define its active population.
Rules are represented by different masks.

Masks for data processing

Creating various Masks to implement the rules per feature for narrowing down the active population:

# Valid Price
valid_price = retail.UnitPrice >= 0


# Valid Description
valid_desc = retail.Description.notnull()


# Valid CID
valid_CID = retail.CustomerID.notnull()


# Invoice type-N (Normal)
inv_N = retail.InvoiceCode == "N"


# Invoice type-C (Cancellation)
inv_C = retail.InvoiceCode == "C"


# Invoice type-N (Amendment)
inv_A = retail.InvoiceCode == "A"


# Quantity Negative
q_neg = retail.Quantity < 0 


# Quantity Positive
q_pos = retail.Quantity >= 0

Finally, we're applying the masks to the original dataset navigating down the tree in order to arrive to the end nodes which are the active population.

A new dataframe is created to save separately as "retail_pp".

# Path1 - Filter population down to include all 
# valid Customer IDs with Valid Price and Description
p1 = valid_price & valid_desc & valid_CID


# Path2 - Filter population down to include all 
# Normal (type-N) transactions with Positive Quantities
p2 = inv_N & q_pos


# Path3 - Filter population down to include all 
# Cancel (type-C) or Adjust (type-A) transactions 
# with Negative Quanitities
p3 = (inv_A | inv_C) & q_neg


# Path to Leafs: Combine Paths 1, 2 and 3:
# *************** CREATE A COPY ************
retail_pp = retail.loc[p1 & (p2 | p3)].copy()


# Inspect new pre-processed and clean dataset

retail_pp.head()

Number of Orders(Invoices) Visualisations

There is a separate section below referring to Key Performance Indicators whereby we capture, analyse and visualise most of the data. In this section, we would like to present two visuals around the number of order per countries and the number of orders per invoice:

As we can see most orders are from United Kingdom and while we could have kept data only from UK, we have decided to keep all active data within the Active population without dropping data from other countries.

In the bottom graph, there is a small tail of data around the 50-100 mark that we would like to explore further. As such we created the top graph, that has zoomed in the top 100 largest orders to visualise this better.

Orders summary

Top graph

There is a small amount of Invoices (less than 200) that contain a large amount of products ordered in a single invoice. e.g.

A single invoice has 1100 products
Another single invoice has 800 products
Four invoices have 700 products
In total, up to 200 number of invoices have more than 100 products each

Bottom graph

Most invoices (24000+) have up to 100 each with the exception of a small population (<200) that have more than 100 products each.

Key Performance Indicators

Transaction KPIs

This involves analysis around customer transaction data grouped by month. i.e. Revenue, Revenue growth, Order numbers, revenue per order, etc...

Monthly Revenue

The formula for calculating the revenue on a monthly basis is shown below:

Calculate Order Value for each row by multiplying Unit Price with Quantity
Group by Year and Month using the respective elements of the index of the dataframe
Sum Order Values grouped by Year and Month

Monthly Revenue Growth - Month on month percentage comparison of Revenue change.

Monthly Order Count - Count Invoices per grouping.

Average revenue per order - Average monthly revenue above.

Customer KPIs

This involves analysis around number of active customers and segregation between new and old customers from a Revenue perspective:

Active Customers - Unique count of Customer IDs per grouping.

Revenue of New vs Old Customer - Define new feature to distinguish Revenue from New or Existing Customers.

KPI Dataframes

In order to capture the key Monthly KPIs, the main retail_pp DataFrame has been used. The new dataframe was generated as Multi-Index one by:

Grouping retail_pp based on Month and Year and creating an index based on these
Renaming the index levels to Year and Month
Generating and appending each new KPI on the new dataframe
Plotting the KPIs using Line or Graph bars as required

# 1. Revenue - Generate Monthly KPIs DataFrame by aggregating Revenue
m_kpis = pd.DataFrame(retail_pp.groupby([retail_pp.index.year,retail_pp.index.month])['Revenue'].sum())


# 2. Generate Monthly Growth rate based on previous months revenue
m_kpis['MonthlyGrowth'] = m_kpis['Revenue'].pct_change()


# 3. Generate Active Customers
m_kpis['ActiveCustomers'] = pd.DataFrame(retail_pp.groupby([retail_pp.index.year,retail_pp.index.month])['CustomerID'].nunique())


# 4. Generate Monthly Order Count (Quantity)
m_kpis['MonthlyOrderCount'] = pd.DataFrame(retail_pp.groupby([retail_pp.index.year,retail_pp.index.month])['Quantity'].sum())


# 5. Gengerate Monthly Order Average
m_kpis['MonthlyOrderAverage'] =  pd.DataFrame(retail_pp.groupby([retail_pp.index.year,retail_pp.index.month])['Revenue'].mean())


# Rename index to capture Year and Month
m_kpis.index.set_names(['Year', 'Month'], inplace = True)

m_kpis

The new multi-index dataframe generated for the monthly KPIs looks like below:

For the Customer KPIs dataframe, and in order to differentiate between an existing and new customer:

Created a new dataframe that captures the first purchase date of each customer grouped by CustomerID
Merged the new feature (First Purchase Date) to the existing retail_pp dataframe based on CustomerID
Created a new feature (User Type) in retail_pp dataframe based on the the Invoice Date of the current month being before or after the First Purchase date
Created Customer KPIs dataframe grouping by Year, Month and UserType and adding the Revenue per month

# Generate new dataframe based on CustomerID and its first purchase date
customer_fist_purchase = retail_pp.groupby('CustomerID').InvoiceDate.min().reset_index()
customer_fist_purchase.columns = ['CustomerID','FirstPurchaseDate']
customer_fist_purchase['FirstPurchaseYearMonth'] = customer_fist_purchase['FirstPurchaseDate'].map(lambda date: 100*date.year + date.month)


# Add first purchase date column to the new Customer Dataframe by merging with the original retail_pp
retail_ppp = pd.merge(retail_pp, customer_fist_purchase, on='CustomerID')
retail_ppp['Date'] = pd.to_datetime(retail_ppp.InvoiceDate.dt.date)
retail_ppp.set_index('Date', inplace=True)
retail_ppp.head()


# Create new column "User Type" and default it to "New" as its values. Assign "Existing" value if 
# User's "FirstPurchaseYearMonth" was before the selected "InvoiceYearMonth"
retail_ppp['UserType'] = 'New'

retail_ppp.loc[retail_ppp['YearMonth']>retail_ppp['FirstPurchaseYearMonth'],'UserType'] = 'Existing'


# Calculate the Revenue per month for each user type
customer_kpis = pd.DataFrame(retail_ppp.groupby([retail_ppp.index.year,retail_ppp.index.month, retail_ppp.UserType])['Revenue'].sum())
customer_kpis.index.set_names(['Year', 'Month', 'UserType'], inplace = True)


customer_kpis

Our new dataframe looks like below:

Visualisations

Structuring our data in a multi-index dataframe, makes it very useful to start plotting in various forms like line plots or bar plots as shown below.

As per diagram below, during April there has been a significant drop in Revenue growth which is not part of the typical variation you see during June or July the same year. Also, worth noting that the sudden drop at the end of the graph is due to lack of data in Dec rather than and actual business metric.

# Line Graph to capture the monthly revenue
m_kpis.plot.line(y='Revenue', title='Revenue per month',color = 'r',
         marker='h', 
         markersize=10, linewidth=2,
         markerfacecolor='white');

When reviewing the rate of growth bar graph, it confirms the negative drop during April as the biggest spike of negative growth.

# Horizontal Bar Graph to capture monthly revenue growth
m_kpis.plot.barh(y='MonthlyGrowth',title='Monthly Growth', rot=0);

Again, the monthly order count graph, shows a consistent picture of the sales of goods: during April there was a significant drop of the Quantities Ordered while the rest of the graph shows an organic growth of the business.

# Horizontal Bar graph to capture number of quantities ordered per month
m_kpis.plot.line(y='MonthlyOrderCount',title='Quantities ordered per Month',color = 'b',
         marker='h', 
         markersize=10, linewidth=2,

         markerfacecolor='white');

Finally, the Monthly order graph has been selected to explore the average value of an order and its variation during research window. There seems to be a) a small variation between the orders b) slight increase of the average order c) significant drop in the value of order during April in comparison with the average value in the neighbouring months.

# Horizontal Bar graph to capture number of quantities ordered per month

m_kpis.plot.barh(y='MonthlyOrderAverage',title='Monthly Order Average value');

On the Customer KPIs, it's interesting to confirm that this has been the same picture, i.e. organic growth of the customer base with a similar drop in April.

# Line graph to capture the active customers per month
m_kpis.plot.line(y='ActiveCustomers', title='Active customers per month',color = 'g',
         marker='h', 
         markersize=10, linewidth=2,

         markerfacecolor='white');

In the diagram below we can see clearly a healthy increase of the existing customer base. Regarding new customers, following a natural decrease after the initial customer acquisition, we can see the Revenue stabilising to a healthy steady amount (i.e. circa ~$100000).

# Plot new vs existing customers

customer_kpis.unstack().plot.bar(y='Revenue', title='New vs Existing CUstomer Monthly Revenue', rot=0);

Modelling using RFM Analysis

In order to do Customer Segmentation, the RFM modelling technique has been used. RFM stands for Recency - Frequency - Monetary Value with the following definitions:

Recency - Given a current or specific date in the past, Recency captures the last time that the customer made a transaction.
Frequency - Given a specific time window, Frequency captures the number of transactions the customer did during that window.
Monetary Value or Revenue - Given a specific window, Revenue captures the amount spent by the customer.

Based on the values above, one can construct various segments that classify the customers to:

Low value Segment

R - Less active
F - Not very frequent
M - Low spend

Mid value Segment

R - Medium or inconsistent activity
F - infrequent or frequent at specific times
M - varied spending but overall in the mid of the scale

High value Segment

R - Very active
F - Very frequent
M - Great spending profile

Modelling Approach

Going forward the approach for deriving customer segments would be:

Calculate RFM scores individually and plot them to get an idea of their RFM values and ranges
Use K-Means clustering unsupervised learning algorithm to create these 3 clusters above
Use techniques to optimise the number of clusters based on Silhouette and Inertia scores

Recency

To calculate Recency, we first start by creating a new data frame to hold all the RFM scores per customer by grouping per CustomerID.

Then we create another dataframe based on the maximum Invoice Date grouped by CustomerID and fix an observation point (here it's the last invoice day in the dataset) as a line in the sand point.

Finally, we generate the Recency in days by subtracting the Last Invoice Day (line in the sand point) for that customer from the Last Purchase Date:

As per code below, we merge the two dataframes on CustomerID to create the Recency feature in our new Customer Dataframe.

# Generate new dataframe based on unique CustomerID to keep track of RFM scores
customer = pd.DataFrame(retail_pp['CustomerID'].unique())
customer.columns = ['CustomerID']


# Generate new data frame based on latest Invoice date from retail_ppp dataframe per Customer (groupby = CustomerID)
recency = retail_ppp.groupby('CustomerID').InvoiceDate.max().reset_index()
recency.columns = ['CustomerID','LastPurchaseDate']


# Set observation point as the last invoice date in the dataset
LastInvoiceDate = recency['LastPurchaseDate'].max()


# Generate Recency in days by subtracting the Last Purchase date for each customer from the Last Invoice Date
recency['Recency'] = (LastInvoiceDate - recency['LastPurchaseDate']).dt.days


# Consolidate to customer DataFrame
customer = pd.merge(customer, recency[['CustomerID','Recency']], on='CustomerID')


customer.head()

Frequency

Frequency metric reflects the number of orders per Customer, so a simple count of the invoices grouped per Customer ID would do:

# Count number of invoices per CustomerID and store in new frequency Dataframe
frequency = retail_ppp.groupby('CustomerID').InvoiceDate.count().reset_index()
frequency.columns = ['CustomerID','Frequency']


# Consolidate Frequency to existing Customer DataFrame

customer = pd.merge(customer, frequency, on='CustomerID')

Monetary Value (Revenue)

For Revenue, the grouping goes by CustomerID, and the aggregation function is Sum.

# Revenue per transaction has already been calculated as per KPIs section
# Grouping revenue per Customer ID
revenue = retail_ppp.groupby('CustomerID').Revenue.sum().reset_index()

# Consolidate Revenue to existing Customer DataFrame
customer = pd.merge(customer, revenue, on='CustomerID')


customer.head()

The dataframe now looks like below:

If we plot the histograms for each of R, F and M values we get the following;

Note, there are some negative values in the Revenue, but that is explained by some returns or cancellations of products and have been intentionally kept within the active population as long as it falls within the rest criteria set by the masks created.

If we create a Scatter plot of the pairs we can visualise the relationship between the R, F, M variables and see if there are any obvious segments:

There's clearly some distinction between high value segments and low value ones, but the difficulty in inspecting such scatter plots visually is finding the ideal boundaries and also identifying the middle segment.

K-Means Clustering

For that purpose, we will use the power of K-Means clustering, an unsupervised learning method that can help us identify a specific number of clusters, even without labels.

The steps followed are:

Set (arbitrarily) the number of clusters: here we'll aim for K=3
Normalise the R, F, M values of data using StandardScaler method
Train model and review results (centeroids and assigned clusters)
Visualise results
Calculate evaluation metrics (Inertia & Silhouette score)
Optimise K based on multiple runs for a range of Ks and evaluate using the Elbow graph
Review final customer segments

Here's the code that implements steps 1-3:

# Number of clusters (Configurable) - Initially we're attempting based on the 3 Segments (Low, Mid, High)
k=3


# Scaling input using StandardScaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
Xstd = pd.DataFrame(scaler.fit_transform(X), columns=X.columns)


# Run and train main K-means algorithm based on all input features
from sklearn.cluster import KMeans
model = KMeans(n_clusters=k, random_state=0)
model.fit(Xstd)


# Review output cluster labels
cluster_labels = model.labels_
print("Assigned cluster labels: \n", cluster_labels)


# Review Centroids of clusters
centroids = model.cluster_centers_
print("Location of centroids: ")
print(centroids)


# Append clusters to input features table
Xstd['clusters'] = cluster_labels


Xstd.head()

Now, we can visualise the results and get an initial view of the segments:

We can now clearly see:

Yellow - High value segment
Green - Mid value segment
Purple - Low value segment

To evaluate the performance of K-Means clustering algorithm we use the following metrics:

Inertia value - absolute value. The rule of thumb is:the smaller the number the better the fit.
Silhouette Score - range between -1 and 1. Typically, high score indicates that the population is well matched to a cluster

Onto step 6, we'll try to optimise the results by running k-means for a range of Ks:

inertias = []
ss = []


for k in range(2,11):
    
    kmeans = KMeans(n_clusters=k, init='k-means++', max_iter=300, n_init=10, random_state=42)
    kmeans.fit(Xstd)
    inertias.append(kmeans.inertia_)

    ss.append(metrics.silhouette_score(Xstd, kmeans.labels_, metric='euclidean'))

The elbow graph shows that we're just about right with 3 clusters, though you can also choose 4 or 5 (as a stretch).

Plotting the Silhouette score for a range of Ks also confirms that while beyond 5 we get a more accurate matching, it means that it probably over-fits and thus not optimal:

Finally onto step 7, the new Customer Dataframe with the assigned segments an labels looks like below:

If we plot the 3 different segments we get the following bar plot:

Business Actions

Once we have a clear view of the Segments above, specific actions can/should be planned across the Business (Operations, Marketing, Product, etc...) to address any potential issues;

High Value: There's a small number of these clients and key priority for the business would be to improve Retention of these customers as they are the most valuable asset.
Mid Value: There's a larger proportion of these customers who should be treated as potentially High Value ones with the right actions: aim to increase Retention and Frequency and bring them closer to the brand and the product so eventually become High Value.
Low Value: the vast majority will be in this category and the first step in addressing this group would be to aim to increase Frequency and understand if there are any potential issues around the product or service.

Summary

To summarise our journey so far in this project, we've followed the steps below:

Data Review & Understanding - Explored the dataset to understand the features and missing values.
Feature Engineering & Processing - Created new features to help us uncover better insights.
KPIs & Visualizations - Created a broad set of KPIs to get a better business view of the data.
RFM analysis & new features - Used RFM analysis to model the data.
Unsupervised learning K-Means clustering - Used unsupervised learning to tell us about the various data clusters.
Optimization & Evaluation - Run K-means clustering for a range of Ks to optimise the algorithm.
Business Actions & Segments Mapping - Translated the results of Machine Learning into Business Actions

Next Steps

As a next step, the work being done here can be leveraged to predict Customer Lifetime Value (CLTV).

This can be done using the RFM features created here as input to a Supervised Machine Learning model with output being the total amount spent by the customer in their lifetime (Revenue).

I hope you've found this article useful, it was certainly a pleasure working through this project.

GitHub

The full Jupyter Notebook for this project can be found here.

References

https://towardsdatascience.com/data-driven-growth-with-python-part-1-know-your-metrics-812781e66a5b

https://towardsdatascience.com/data-driven-growth-with-python-part-2-customer-segmentation-5c019d150444

https://www.datacamp.com/community/tutorials/introduction-customer-segmentation-python

About Angelo

I’m a freelance Principal Management Consultant and applied Data Scientist with broad experience implementing Data Transformation and delivering Technology Change on behalf of Vendors and Global Banks in Financial Services and Government.

Coming from a solid academic background in Robotics Engineering, skills like mathematics, statistics and programming (luckily) came off the shelve for me, also coupled with the genuinely curious, analytical and inquisitive thinking of an Engineer.

This unique blend of technical skills and industry experience have helped me work with a broad range of Business and Technical teams in Technology and Data, helping them bridge the gaps and bring critical projects over the line thus delivering great value add to their organisation.

My interest going forward is to explore how to use data to drive better insights for sales and digital marketing teams, in order to help them retain their existing client base and also expand further to new ones.

Feel free to drop me a line at:

LinkedIn - https://www.dhirubhai.net/in/etzimopoulos/

GitHub - https://github.com/etzimopoulos

Medium - https://medium.com/@etzimopoulos

#customersegmentation #rfmanalysis #datascience #customerretention #digitalmarketing #capstone #machinelearning #managementconsulting #businessconsulting

Chi Le Van

Data Engineer

4 年

Thanks so much

1 次回应

James Phoenix

O'Reilly Author | Full Stack Development | AI Engineer

4 年

Awesome article, keep up the good work :)

1 次回应

查看更多评论

要查看或添加评论，请登录

Evangelos (Angelo) T.的更多文章

Data Visualisation libraries for Python

2021年3月4日

Data Visualisation libraries for Python

#datascience #eclecticaboutdata #eclecticdatainsights #python #datavis Which Visualisation library do you use most in…

5 条评论
Explore infinitely using The Power List - My first 52 Weeks of Data Science and more!

2019年12月31日

Explore infinitely using The Power List - My first 52 Weeks of Data Science and more!

Disclaimer No, this is not a pledge about what I’m going to be doing over the next 52 weeks or my new year resolution…

10 条评论
Logistic Regression using Single Layer Perceptron Neural Network (SLPNN) - Republished from Medium

2019年11月4日

Logistic Regression using Single Layer Perceptron Neural Network (SLPNN) - Republished from Medium

10 Weeks Of Machine Learning Fun –Weeks 4–10 Retrospective Welcome to the final Retrospective of the ML challenge which…
10 Weeks Of Machine Learning Fun - Week 3 Retrospective - Logistic Regression (part I)

2019年6月9日

10 Weeks Of Machine Learning Fun - Week 3 Retrospective - Logistic Regression (part I)

Introduction Welcome to Week 3 of my ongoing ML challenge. As a quick introduction, for those who'd like to follow the…
Concentration: your ticket to Success

2019年6月5日

Concentration: your ticket to Success

There are people who can concentrate 100% in whatever they do, which rockets their productivity e.g.
10 Weeks Of Machine Learning Fun - Week 2 Retrospective - Linear Regression

2019年6月2日

10 Weeks Of Machine Learning Fun - Week 2 Retrospective - Linear Regression

Introduction As a quick introduction, for those who'd like to follow the full 10 week journey, here's all the links to…

4 条评论
10 Weeks Of Machine Learning Fun - Week 1 Retrospective - One Hot Encoding

2019年5月27日

10 Weeks Of Machine Learning Fun - Week 1 Retrospective - One Hot Encoding

Introduction (Link to the original post about the challenge: #10WeeksOfMachineLearningFun #Week1 #Retrospective Week 1…

3 条评论
A short story about the power of Classical Music

2019年3月3日

A short story about the power of Classical Music

At school, I played piano for almost 8 years but eventually I gave up when I had to focus on getting into Uni, my…

1 条评论
A story about Fixed vs Growth Mindset

2019年2月20日

A story about Fixed vs Growth Mindset

(In my own words from Carol Dweck's book, "Mindset") "..

3 条评论

See all articles

Customer Segmentation using RFM Analysis in Python

Evangelos (Angelo) T.

Director, Lead Data Steward, Capital Markets

Introduction

Exploratory Data Analysis - EDA

Load Data

Data Overview

Feature Engineering

Active Population

Masks for data processing

Number of Orders(Invoices) Visualisations

Key Performance Indicators

Transaction KPIs

Customer KPIs

KPI Dataframes

Visualisations

Modelling using RFM Analysis

Recency

Frequency

Monetary Value (Revenue)

K-Means Clustering

Business Actions

Summary

Next Steps

GitHub

References

About Angelo

Evangelos (Angelo) T.的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Predictive Analytics: How Businesses Can Anticipate Market Trends & Drive Growth

Going Beyond Simple Apriori Based Analysis

Harnessing The Power of Data Analytics for Growth

Extending churn analysis to revenue forecasting using R

Data Analytics: Predict your way to success.

Exploring Data Products: Driving Business Value through Innovation

Unveiling the Power of Data Analytics in Product Decision Making

Unlocking Business Growth with Predictive Analytics (See the Future, Sort Of)

Data Science: Connecting with the Humans Behind the Data

Unlocking the Power of Predictive Analytics in Go-to-Market Strategies ??

Introduction

Exploratory Data Analysis - EDA

Load Data

Data Overview

Feature Engineering

Active Population

Masks for data processing

Number of Orders(Invoices) Visualisations

Key Performance Indicators

Transaction KPIs

Customer KPIs

KPI Dataframes

Visualisations

Modelling using RFM Analysis

Recency

Frequency

Monetary Value (Revenue)

K-Means Clustering

Business Actions

Summary

Next Steps

GitHub

References

About Angelo

Evangelos (Angelo) T.的更多文章

Data Visualisation libraries for Python

Explore infinitely using The Power List - My first 52 Weeks of Data Science and more!

Logistic Regression using Single Layer Perceptron Neural Network (SLPNN) - Republished from Medium

10 Weeks Of Machine Learning Fun - Week 3 Retrospective - Logistic Regression (part I)

Concentration: your ticket to Success

10 Weeks Of Machine Learning Fun - Week 2 Retrospective - Linear Regression

10 Weeks Of Machine Learning Fun - Week 1 Retrospective - One Hot Encoding

A short story about the power of Classical Music

A story about Fixed vs Growth Mindset

社区洞察

其他会员也浏览了

Unlocking the Power of Predictive Analytics: How Businesses Can Anticipate Market Trends & Drive Growth

Going Beyond Simple Apriori Based Analysis

Harnessing The Power of Data Analytics for Growth

Extending churn analysis to revenue forecasting using R

Data Analytics: Predict your way to success.

Exploring Data Products: Driving Business Value through Innovation

Unveiling the Power of Data Analytics in Product Decision Making

Unlocking Business Growth with Predictive Analytics (See the Future, Sort Of)

Data Science: Connecting with the Humans Behind the Data

Unlocking the Power of Predictive Analytics in Go-to-Market Strategies ??