Bank marketing campaigns analysis with Machine Learning.

Bank marketing campaigns analysis with Machine Learning.

Abstract

This is data-set that describe Portugal bank marketing campaigns results. Conducted campaigns were based mostly on direct phone calls, offering bank's clients to place a term deposit. If after all marking affords client had agreed to place deposit - target variable marked 'yes', otherwise 'no'.

Source of the data https://archive.ics.uci.edu/ml/datasets/bank+marketing

Data-set description https://www.kaggle.com/volodymyrgavrysh/bank-marketing-campaigns-data-set-description

Citation Request:

This data-set is public available for research. The details are described in S. Moro, P. Cortez and P. Rita. "A Data-Driven Approach to Predict the Success of Bank Telemarketing." Decision Support Systems, Elsevier, 62:22-31, June 2014 <

Task

  • predicting the future results of marketing companies based on available statistics and, accordingly, formulating recommendations for such companies in the future.
  • building a profile of a consumer of banking services (deposits).
  • make recommendations for future campaigns

Approach

The following steps will be performed to complete the task:

  1. Loading data and holding a short Explanatory Data Analysis (EDA).
  2. Formulating hypotheses regarding individual factors (features) for conducting correct data clearing and data preparation for modeling.
  3. The choice of metrics result.
  4. Building a pipeline for Cross Validation and Grid Search procedures (search for optimal parameters of the model)
  5. The choice of the most effective model **, build learning curve rate
  6. Formulation of conclusions.
** we intentionally use most basic machine learning models to increase the level of intelligibility of the solution

Feature description

Bank client data:

  • 1 - age (numeric)
  • 2 - job : type of job (categorical: 'admin.','blue-collar','entrepreneur','housemaid','management','retired','self-employed','services','student','technician','unemployed','unknown')
  • 3 - marital : marital status (categorical: 'divorced','married','single','unknown'; note: 'divorced' means divorced or widowed)
  • 4 - education (categorical: basic.4y','basic.6y','basic.9y','high.school','illiterate','professional.course','university.degree','unknown')
  • 5 - default: has credit in default? (categorical: 'no','yes','unknown')
  • 6 - housing: has housing loan? (categorical: 'no','yes','unknown')
  • 7 - loan: has personal loan? (categorical: 'no','yes','unknown')

Related with the last contact of the current campaign:

  • 8 - contact: contact communication type (categorical: 'cellular','telephone')
  • 9 - month: last contact month of year (categorical: 'jan', 'feb', 'mar', ..., 'nov', 'dec')
  • 10 - day_of_week: last contact day of the week (categorical: 'mon','tue','wed','thu','fri')
  • 11 - duration: last contact duration, in seconds (numeric). Important note: this attribute highly affects the output target (e.g., if duration=0 then y='no'). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

other attributes:

  • 12 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
  • 13 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric; 999 means client was not previously contacted)
  • 14 - previous: number of contacts performed before this campaign and for this client (numeric)
  • 15 - poutcome: outcome of the previous marketing campaign (categorical: 'failure','nonexistent','success')

social and economic context attributes

  • 16 - emp.var.rate: employment variation rate - quarterly indicator (numeric)
  • 17 - cons.price.idx: consumer price index - monthly indicator (numeric)
  • 18 - cons.conf.idx: consumer confidence index - monthly indicator (numeric)
  • 19 - euribor3m: euribor 3 month rate - daily indicator (numeric)
  • 20 - nr.employed: number of employees - quarterly indicator (numeric)

Output variable (desired target):

  • 21 - y - has the client subscribed a term deposit? (binary: 'yes','no')


1. Explore categorical features (EDA)?

No alt text provided for this image
No alt text provided for this image

Primary analysis of several categorical features reveals:

  1. Administrative staff and technical specialists opened the deposit most of all. In relative terms, a high proportion of pensioners and students might be mentioned as well.
  2. Although in absolute terms married consumers more often agreed to the service, in relative terms the single was responded better.
  3. Best communication channel is mobile phone.
  4. The difference is evident between consumers who already use the services of banks and received a loan.
  5. Home ownership does not greatly affect marketing company performance.

Explore numerical features (EDA)

No alt text provided for this image

From correlation matrix we observe next:

  • most correlated with target feature is call duration. So we need to transform it to reduce the influence
  • highly correlated features (employment rate, consumer confidence index, consumer price index) may describe clients state from different social-economic angles. Their variance might support model capacity for generalization.

2. Formulating hypotheses regarding individual factors (features) for conducting correct data cleaning and data preparation for modeling

Data cleaning strategy

Since categorical variables dominate in data-set and the number of weakly correlated numeric variables is not more than 4, we need to transform categorical variables to increase the model's ability to generalize data. (we can not drop them)

Particular attention should be paid to the Duration Feature and categories that can be treated as binary. It suggests using binning and simple transformation accordingly (0 and 1)

For categories of more than 3 types of possible option (marital and education) it is proposed to use the encode targeting - it will allow correctly relate the values to the target variable and use indicated categories in numerical form.

In some cases, re-scaling is proposed to normalize the data.

3. The choice of metrics result

It is proposed to use roc_auc* metrics for evaluating different models with additional monitoring of the accuracy metric dynamic.

This approach will allow us to explore models from different angles.

4. Building a pipeline for Cross Validation and Grid Search procedures (search for optimal parameters of the model)

See code here https://www.kaggle.com/volodymyrgavrysh/bank-marketing-campaigns-dataset-analysis#4.-Building-a-pipline-for-Cross-Validation-and-Grid-Search-procedures-(search-for-optimal-parameters-of-the-model)

5. The choice of the most effective model

Our best performed model with roc_auc* (0.9269) metric is Random forest . This classifier could achieve accuracy rate 0.903 that is average accuracy among all classifiers (0.904).

No alt text provided for this image
All results of the models performance

We can build graph to check Random Forest Classifier performance with OOB** score to be sure that critical hyper-parameter was correctly selected during Grid Search. As you may see it almost the same - 80 estimators with best roc_auc score and 90 estimators with maximum of OOB score

* https://en.wikipedia.org/wiki/Receiver_operating_characteristic

** https://en.wikipedia.org/wiki/Out-of-bag_error

No alt text provided for this image

Let see the roc_auc graph.

No alt text provided for this image

Curve is well distributed with tendency to False Positive Rate. The roc auc values of the best model of 0.9269 is quite high level to make later assumptions about the data.

We can build feature importance of Random Forest Classifier with best roc_auc score.

No alt text provided for this image

6. Conclusions and recommendations.

This analysis can be carried out at the level of individual bank branches as does not require sick resources and special knowledge (the model itself can be launched automatically with a certain periodicity)

Potentially similar micro-targeting will increase the overall effectiveness of the entire marketing company.

What general recommendations can be offered for a successful marketing company in the future?

1. Take into account the time of the company (May is the most effective)

2. Increase the time of contact with customers (perhaps in a different way formulating the goal of the company). It is possible to use other means of communication.

3. Focus on specific categories. The model shows that students and senior citizens respond better to this proposal.

4. It is imperative to form target groups based on sociological-economic categories. Age, income level (not always high), profession can accurately determine the marketing profile of a potential client.

Given these factors, it is recommended to concentrate on those consumer groups that are potentially more promising.

The concentration of the bank’s efforts will effectively distribute the company’s resources to the main factor - the bank’s contact time with the client, which affects conversion most of all.

--------------------------------------------------------------------------------------------------------------

The continuation of such a study may be the formation of a clear client profile - by age, gender, income and other factors, as well as the adaptation of the product itself (deposit) to a specific category.

------------------------------------------------------------------------------------------------------------

See all code on Kaggle https://www.kaggle.com/volodymyrgavrysh/bank-marketing-campaigns-dataset-analysis



要查看或添加评论,请登录

Volodymyr Gavrish的更多文章

  • Земельное рабство в современной Украине

    Земельное рабство в современной Украине

    Жители сельских районов, особенно в небольших селах и поселках, оказались в ситуации, которая в формальном и реальном…

  • Как бороться с коррупцией в общественном строительстве?

    Как бороться с коррупцией в общественном строительстве?

    практика главы Окнянской райгосадимнистрации В 2016 году электронное декларирование в Украине показало, как мало мы…

  • Борьба с коррупцией – пример Германии

    Борьба с коррупцией – пример Германии

    Как с ней борются немецкие государственные и частные организации. Краткий отчет о результатах обучения по программе…

  • small steps

    small steps

    To reform and improve public services in Ukraine is really hard, but the longest journey begins with a small step.

社区洞察

其他会员也浏览了