登录查看更多内容

Predicting Customer Behaviour in 5?Steps (using Data Science)

Tim Paris

CEO at Dataro / Lover of fundraising data

发布日期: 2017年8月10日

Prediction is the keyword on everybody’s lips. Whether it is Nate Silver predicting the US election (or not), or Facebook investing heavily in VR, business decisions should be based on the world of tomorrow, not on the world of today.

Predictive analytics is taking off in the business world, and every business wants to know what their customer will do next. But how is this done? We looked at the top 20 Google results for “Predicting customer behaviour” and found only vague tips and complex academic papers. For this article, we want to provide a concrete step-by-step guide to getting the job done. In the following we provide a 5-step guide to predicting customer behaviour using data science methods.

Step 1 — Define a clear goal

For any prediction question, the most important step is to start with a concrete goal. Your goal must be able to produce testable predictions. Being able to say “Retail purchases will increase” is too vague. When will they increase? By how much? Your predictions will have to pass the “Clairvoyance Test”. A better goal is to “identify which customers will make a retail purchase within the next 14 days with 90% accuracy”. Of course, there are many other behaviours you may want to predict, such as customer churn, LTV or a response to a particular campaign.

Step 2 — Collect the right data

Now you have your goal, what data will you need to achieve it? A good way to do this is to work backwards. We want to predict purchase intent, so it is going to be very handy to know about historical purchases. It might also help to know how many items customers typically buy, how often do they make transactions, do they buy during sales time, after rewards, before their birthday. The most logical predictors are usually the most informative, but there is no guarantee and choosing the right data (known as feature selection) can often be more art than science.

How much data will you need? Unfortunately there is no clear rule for the amount of data you will need, but in the case of retail trends, you will want at least 2 years of historical data to be able to incorporate seasonal trends into your model.

Step 3 — Build a model but start simple

Next, open up your modelling tool of choice. If you use Python, try Scikit-Learn, or for R we like Max Kuhn’s Caret package. Both have the capacity to implement a large variety of complex machine learning algorithms, and while it is tempting to go fancy, the most important thing is to start with a simple model. Not because these are the simplest to fit and most interpretable, but because you can turn them around quickly. This is critical when building predictive models because this is an iterative process and the biggest gains can be made quickly. The risk of starting with a complicated model is that you don’t have time to improve on it, or worse the sophisticated model is no better than the simple model — and it’s less interpretable.

Step 4 — Test your model

Congratulations, you now have a model that makes predictions. In our case, we wanted to know which customers are likely to make a purchase in the next 14 days. The result of our model is simply the addition of a new column or variable to our data with the label ‘Purchase’ or ‘No Purchase’ for each customer in our database. It is now important to practice good data hygiene. Make sure you test your predictions on an independent dataset that has not been used in training the model. In most cases this will be based on a sample on a ‘hold out’ set of the data, typically 20%. This will give you the most reliable estimate of how your model will perform in the ‘real world’.

Step 5—Set (but don’t forget) your model

To make use of your model you need to give it a place to work for you. These are your deployment options:

Keep it local: The simplest method is to simply run the model on the machine that generated the model. This requires the least amount of work to set up.
Batch it: Set up a Cron job, or other automated service to run the model every hour/day/week and add the predictions to your database.
Let it REST: By creating a RESTful API, anyone in your organisation can easily access the model via HTTP and generate predictions in real time.

Whatever your deployment option, keep in mind that a predictive model is only as good as the data it was built on. This means that if you just let the model sit for too long, the data it was trained on will become more out of data, providing increasingly worse predictions. Always remember to keep your models trained with the most recent data or else they will quickly lose their value without you realising.

Last Word

Predictive modelling is an iterative process. Follow these 5 steps and you will be able to quickly generate predictions, but in order to stand the test of time, revisit your model frequently.

Gregory Zelic

Data-Driven Program Design and Delivery | Innovation and Business Intelligence Specialist

7 年

Good read Tim ;)

Sasha Jade

Ex-Founder/CEO | Digital Strategist | Policy Scholar | Social Advocate

7 年

Love your work!

Damien Hughes

Founder @ Builtlist

7 年

Fantastic guide thanks Tim.

1 次回应

查看更多评论

要查看或添加评论，请登录

Tim Paris的更多文章

How not to apply for a job on Linkedin

2024年4月21日

How not to apply for a job on Linkedin

Here at Dataro, we are always hiring and that means sifting through a LOT of applicants. In the past year I have…

15 条评论
Fundraisers not using AI should prepare to be left behind

2023年8月28日

Fundraisers not using AI should prepare to be left behind

In the realm of fundraising, I've noticed an interesting situation play out when it comes to donor data. Having worked…

23 条评论
The Future of Fundraising with AI (GPT, Machine Learning & AI Assistants)

2023年5月17日

The Future of Fundraising with AI (GPT, Machine Learning & AI Assistants)

Throughout my experience as a data scientist, I've witnessed firsthand the undeniable impact of technology on our…

2 条评论
Has COVID impacted Recurring Giving?

2020年12月4日

Has COVID impacted Recurring Giving?

The whole world has been shaken by the effects of COVID-19, and fundraising is certainly not immune. The pandemic has…
Analysis: How Consumer Confidence Affects Regular Giving

2020年4月30日

Analysis: How Consumer Confidence Affects Regular Giving

The COVID-19 pandemic has led to a historic plunge in consumer confidence. As part of our analysis of economic…
Retaining regular givers as COVID-19 hits new donor acquisitions hard

2020年3月31日

Retaining regular givers as COVID-19 hits new donor acquisitions hard

The fallout from the COVID-19 pandemic has been devastating for many and of course charities are not immune. As job…

6 条评论
Visualisation: The Changing Face of Giving

2019年11月6日

Visualisation: The Changing Face of Giving

Data visualisation is a wonderful thing. It can turn big data into simple stories.

3 条评论
How to Double Donor Reactivations using Machine Learning

2019年6月18日

How to Double Donor Reactivations using Machine Learning

Our client, a highly regarded children’s charity, runs a monthly reactivation telemarketing campaign to reconnect with…

3 条评论
Reducing Churn with Greenpeace

2019年6月12日

Reducing Churn with Greenpeace

Greenpeace is a globally recognised brand and independent campaigning organisation that leads the way in exposing…

4 条评论
5 Tips for More Secure Data

2017年10月26日

5 Tips for More Secure Data

We all handle data in one way or another. Whether it is a trail of emails with company information, sharing a…

See all articles

Predicting Customer Behaviour in 5?Steps (using Data Science)

Tim Paris

CEO at Dataro / Lover of fundraising data

Step 1 — Define a clear goal

Step 2 — Collect the right data

Step 3 — Build a model but start simple

Step 4 — Test your model

Step 5—Set (but don’t forget) your model

Last Word

Tim Paris的更多文章

社区洞察

其他会员也浏览了

Understanding the Concept of the Five Numbers in Machine Learning and Statistics

Your intuitive guide to interpret SHAP's beeswarm plot

Experimentation and Big Data

Unlocking Insights from Timeline Data Using Regression Modeling

Where Analytics, Data Science, Machine Learning Were Applied: Trends and Analysis

6 Best Big Data Analytics Trends and Predictions for 2022

Data vs. Features: The Building Blocks of Data Science

What is Data Science & Top 7 Real-Life Data Science Use Cases: How Data is Revolutionizing Industries?

Bayesian Data Science Meets Human Insight to Revolutionize Transportation Hiring

My Journey with Data & Analytics - Part 7 – Glimpse into what I read in 2023

Step 1 — Define a clear goal

Step 2 — Collect the right data

Step 3 — Build a model but start simple

Step 4 — Test your model

Step 5—Set (but don’t forget) your model

Last Word

Tim Paris的更多文章

How not to apply for a job on Linkedin

Fundraisers not using AI should prepare to be left behind

The Future of Fundraising with AI (GPT, Machine Learning & AI Assistants)

Has COVID impacted Recurring Giving?

Analysis: How Consumer Confidence Affects Regular Giving

Retaining regular givers as COVID-19 hits new donor acquisitions hard

Visualisation: The Changing Face of Giving

How to Double Donor Reactivations using Machine Learning

Reducing Churn with Greenpeace

5 Tips for More Secure Data

社区洞察

其他会员也浏览了

Understanding the Concept of the Five Numbers in Machine Learning and Statistics

Your intuitive guide to interpret SHAP's beeswarm plot

Experimentation and Big Data

Unlocking Insights from Timeline Data Using Regression Modeling

Where Analytics, Data Science, Machine Learning Were Applied: Trends and Analysis

6 Best Big Data Analytics Trends and Predictions for 2022

Data vs. Features: The Building Blocks of Data Science

What is Data Science & Top 7 Real-Life Data Science Use Cases: How Data is Revolutionizing Industries?

Bayesian Data Science Meets Human Insight to Revolutionize Transportation Hiring

My Journey with Data & Analytics - Part 7 – Glimpse into what I read in 2023