Data Mining for Marketing – Simple K-Means Clustering Algorithm

Data Mining for Marketing – Simple K-Means Clustering Algorithm

Data mining is not just for technical people.

And you might have to cluster your data even if you’re just segmenting your clients for your next marketing campaign. Or maybe you’re just a student who’d like to find out the basics of Weka (data mining software).

Here’s a brief data mining tutorial for non-techies to help you get started with clustering:

Where can you get Weka?

The safest option is its official website. Download Weka (Doesn’t work without Java).

And it’s free. ??

Where do you find the right database?

Weka doesn’t work with just any database. And the algorithms you’re going to choose won’t fit all datasets.

So, if you want to use a specific algorithm, it’s best to just create your own set of data over which you can have full control. Aim for more than 1000 rows for accurate data.

But here are three sources where you could find some decent datasets:

data.imf.org

catalog.data.gov

tomslee.net/airbnb-data-collection-get-the-data

(Drop me a line if you know more.) ??

And if you’re looking for a case study (in plain English) with few technical elements so you can get an idea of how clustering really works: ??

Case study – Bank clients segmentation through clustering

Disclaimer: part of the case study is missing as I’ve done it for a college project and the results are not disclosable 

Study objectives

  • Highlight the use of Weka for basic data mining processes
  • Discover the most representative segment of a bank’s (fictional) clients
  • Find out how a bank’s (fictional) services can be improved starting with the data regarding clients’ age, job, marital status, education, account balance, housing, and loans through an online marketing campaign that could bring new clients

Introduction

Data mining is the process through which valid and previously unknown information is extracted from a specific set of data and is then used to make an important business decision.

Briefly put, data mining is a method that allows YOU to find similar behavioral patterns, trends, or tendencies from an existing data set.

The main goal of the entire process is DISCOVERY.

From this point of view, I’ve chosen to find out the most significant clients of a bank (fictional) through clustering.

For this study, I picked a type of application often used in marketing and retail: identifying significant client profile and behavior patterns.

As a field of applicability, I’ve chosen banking. In this case, the main goal was to identify relevant clients (who are also loyal) and use their profile to create new digital marketing campaigns.

Typically, data mining could’ve been used to identify loyal clients or errors in the use of banking services, to discover new behavior, predict the way in which a service will be used, or estimate possible client administration costs.

The main target (and result) was to attract new clients based on analyzed profiles and behavior patterns. Thus, the desired profile of the bank’s possible clients will be created from the data on existing loyal clients.

As a result, we’ll be able to create a digital marketing campaign that will target exactly this market segment. And you might be looking to create alternative campaigns for the other significant client segments as well.

The link between objectives and strategic marketing

  • Highlight the use of Weka for basic data mining processes

Facilitates the use of an innovative method on a dataset owned by a marketing department and capitalizes upon their power to create new marketing campaigns in a fast and more efficient way than any traditional method. Using such data mining tools or method for marketing operations can offer a competitive advantage.

  • Discover the most representative segment of a bank’s (fictional) clients

Using a data mining software or method (like Weka) we can extract the profile of a significant or loyal client/customer. From this profile, we’ll build the online marketing campaigns.

Starting with the information offered by clients, personalized campaigns can be created. Clients’ response towards these is likely to be a positive one and people will be more interested in these than they would in a general, non-personalized campaign.

The success rate of a campaign will thus be considerably higher than if we had used a traditional method of segmentation.

Consequently, the chosen marketing strategy for this case study too will be using an innovative method to reduce costs (since Weka is a free tool ??) and time spent on segmentation and to increase the success rate of marketing campaigns built with this method.

  • Find out how a bank’s (fictional) services can be improved starting with the data regarding clients’ age, job, marital status, education, account balance, housing, and loans through an online marketing campaign that could bring new clients

In the case of companies or marketing departments that are using data mining or the client/market segmentation strategy for the first time, a reorientation of the general marketing strategy is needed.

Therefore, data mining is an easy way of determining which of a client’s attributes can be used to create and start a new digital marketing campaign.

You’ll also find out through which of these attributes you’ll get more success and a better response from your audience.

For example, through the dataset chosen in this case, you can test whether a campaign based on the clients’ job is more efficient than one that targets their age (or the other way around).

There are multiple opportunities and they can be diversified and tested until the right campaign model is found.

Work methodology

Dataset

Undisclosable .csv database. ??

Criteria for selecting a set of data

Any Weka project must start with a correctly built and error-free dataset.

Missing information would cause serious mistakes in the final results and thus jeopardize the marketing campaign we want to create.

For a closer data analysis, all information can be sorted and checked before you add it to Weka from Excel (or any other editor).

For instance, we can sort data according to age so that you can verify the diversity of your list based on the age of the people that are part of it. This ensures the objectivity of the Weka analysis to guarantee that the final campaigns will be fair.

After choosing a database, analyze it to see if it matches your project’s requirements and your objectives.

This way, the right database for this study had to contain a large number of people and relevant data on them that could be used for a marketing campaign. Among the necessary data were demographic characteristics, personal interests, and the relationship between the client and the seller (in this case the fictional bank).

The profile of the chosen dataset

The database I used contains attributes such as age, job, marital status, education level, account balance, and other info regarding their housing and bank loans.

This way, I ensured that the people in the database have diverse profiles/characteristics. Their ages are between 18 and 95; from students to retired people; single, married, or divorced; having a primary, secondary, tertiary, or unknown education; varied account balance, debts, or with no money in their accounts, etc.

Process and algorithm

The process

Data mining is the process of extracting, transforming, and analyzing the data in a set of data regardless of its size.

For this case study, the data mining process was used to gather info regarding a fictional bank’s clients. This type of analysis will then be used to plan a digital marketing campaign and facilitate other general business decisions.

The data mining can help identify errorspatterns, and data correlations to predict approximate but effective results. This information can then be used to generate new results, profit, and other benefits, to reduce costs and risks, or to improve the seller-client relation.

Using exact client data we can customize campaigns that will allow us to increase our profit, satisfy our clients, and avoid losing large sums of money on useless marketing campaigns that don’t target a specific buyer persona.

The data mining algorithm

I used Simple K-Means Clustering as an unsupervised learning algorithm that allows us to discover new data correlations. (Note: It does so much more than just that. But I’ll stick to the basics for now.)

After choosing an algorithm, I’ve selected the number of wanted (or needed if you have a specific target in mind) clusters (3), the maximum number of iterations (500), and the distance metric (EuclideanDistance).

Note: Again, clustering is so much more than just these metrics. And this is a good thing. If you’re looking into learning data mining on an advanced level you’ll see how these functions, classifiers (etc.) can help you get more accurate results.

The clustering results were then shown in a table whose attributes and columns correspond to the final cluster centroids.

Read the full post on my blog.

要查看或添加评论,请登录

Alexandra Cote的更多文章

社区洞察

其他会员也浏览了