AI/ML : Customers Behavior Analysis To Increase Subscriptions for Financial App...From Scratch [Part 1]

AI/ML : Customers Behavior Analysis To Increase Subscriptions for Financial App...From Scratch [Part 1]

AI/ML is without a doubt hot topic of the town and it definitely make sense as soon as we start to understand the capabilities of this in detail. The analysis can create wonders for business owners to predict the exact values which is far from our thinking capabilities. Here would like to explain one of the common problems considering financial application data and predict the customer who is going to subscriber a premium version app subscription or not. Accordingly the company should take action on the customers to give the offers or not. The data contain the customer’s behavior and our job to find the insights from it.

So lets begin...

Problem Statement :- Fintech company has launched an application which can be used in multiple purpose such as loan, savings, payments etc.. The application has two versions(Free and Paid). The goal of the company is to sell the paid version to the target market to save the extra cost of marketing. That’s a reason they are provided the premium feature in the free version app for 24 hours to collect the customer’s behavior. After that, the company hired the Machine Learning Engineer to find insight from the collected data (customer’s behavior).

 If the customers will buy a product anyway so no need to give an offer to that customer Only give offers to those customers who are interested to use paid version app but they can’t afford its cost.

We will use python in order to perform the analysis with Jupiter notebook.

  1. Get your tools ready :- So let's import the essential libraries
import numpy as np                # for numeric calculation
import pandas as pd                # for data analysis and manipulation
import matplotlib.pyplot as plt       # for data visualization
import seaborn as sns                    # for data visualization
from dateutil import parser                # convert time in date time data type

2. Lets explore the data :-

fineTech_appData = pd.read_csv("FineTech_appData.csv")  # Load the data
fineTech_appData.shape                               # get shape of dataset

Output :- (50000, 12)

So the dataset holds 50000 rows and 12 columns, lets see how it looks like

fineTech_appData.head(5)   # show fisrt 5 rows of fineTech_appData DataFrame
No alt text provided for this image

The 6th column’s (screen_list) full information is not visible, so lets see what this column have in detail.

for i in [1,2,3,4,5]:
    print(fineTech_appData.loc[i,'screen_list'],'\n')  

#We print only 5 rows from index 1 to 5 from the screen_list

No alt text provided for this image

Great...Now lets not forget the first rule of data validation. Let's identify the values with null.

fineTech_appData.isnull().sum() # To check the null values
fineTech_appData.info()         # To get the information about the dataset
No alt text provided for this image





So we can see the enrolled_date has 18926 null values and rest all are okay.

fineTech_appData.describe()  # To see the distribution 
No alt text provided for this image

Okay now lets only take the important columns and remove the rest.

fineTech_appData2 = fineTech_appData.drop(['user', 'first_open', 'screen_list', 'enrolled_date'], axis = 1)             #To Drop non usable columns
No alt text provided for this image

Ohh yeah one more thing before we forget lets quickly change the data type of the hour column ( From 02:00:00 > 2 ) makes more sense for data analysis.

fineTech_appData['hour'] = fineTech_appData.hour.str.slice(1,3).astype(int)                   #changing the data time to integer
No alt text provided for this image

Hmmm looks better now...So time see some visualization as mind understand it better.

3. Data visualization :-

1. Heatmap with correlation matrix :-

plt.figure(figsize=(14,6))  # Plot figure with given size
sns.heatmap(fineTech_appData2.corr(), annot = True, cmap ='coolwarm')  # Correlation with 

annot – an array of same shape as data which is used to annotate the heatmap

cmap – a matplotlib colormap name or object. This maps the data values to the color space.

No alt text provided for this image

n the fineTech_appData2 dataset, there is no strong correlation between any features. There is little correlation between ‘numscreens’ and ‘enrolled’. It means that those customers saw more screen they are taking premium app.

2. Count Plot :-

sns.countplot(fineTech_appData.enrolled)
No alt text provided for this image

This suggests more enrolled users compared to the not enrolled users in our data set.

3. Histogram of each feature:-

plt.figure(figsize = (14,7))
features = fineTech_appData2.columns # list of columns name
for i,j in enumerate(features)
plt.subplot(3,3,i+1)                               # create subplot for histogram
plt.title("Histogram of {}".format(j), fontsize = 12)    # title of histogram
bins = len(fineTech_appData2[j].unique())                   # bins for histogram
plt.hist(fineTech_appData2[j], bins = bins, rwidth = 0.8, edgecolor = "y", linewidth = 2, )                                                # plot histogram
plt.subplots_adjust(hspace=0.5)        # space between horixontal axes (subplots)
No alt text provided for this image

Histogram explains minigame, used_primium_feature, enrolled, and like they have only two values.

The histogram of ‘dayofweek’ shows, on Tuesday and Wednesday slightly fewer customer registered the app.

The histogram of ‘hour’ shows the less customer register on the app around 10 AM.

The ‘age’ histogram shows, the maximum customers are younger.

The ‘numsreens’ histogram shows the few customers saw more than 40 screens.

Now lets find out how much time a customer takes to get enrolled in premium feature app after registration, for that we need first open date and enrolled date(Which we dropped earlier).

Parsing first open date and enrolled date in date and time format for easy subtraction.

fineTech_appData['first_open'] =[parser.parse(i) for i in fineTech_appData['first_open']]            # Creating a list
fineTech_appData['enrolled_date'] =[parser.parse(i) if isinstance(i, str) else i for i in fineTech_appData['enrolled_date']]            
No alt text provided for this image

Now Subtract and plot to see the time..

fineTech_appData['time_to_enrolled']  = (fineTech_appData.enrolled_date - fineTech_appData.first_open).astype('timedelta64[h]')
plt.hist(fineTech_appData['time_to_enrolled'].dropna(), range = (0,100))
No alt text provided for this image

So this shows 0 to 10 hours from the time of registration.

We have got enough information for our analysis but we missed one important information which would be very helpful here is "Screen List". This will help us to identify what is the user behavior and how much time he/she is spending on the screen.

We have picked all distinct screen list in another CSV file and will append and make columns of these screen list values and then check this screen name is available in ‘screen_list’ if it is available then add value 1 else 0 in the appended column.

No alt text provided for this image
# string into to number

for screen_name in fineTech_app_screen_data:
fineTech_appData[screen_name] = fineTech_appData.screen_list.str.contains(screen_name).astype(int)
fineTech_appData['screen_list'] = fineTech_appData.screen_list.str.replace(screen_name+",", "")
fineTech_appData.shape

Output 

(50000, 68)

So after all the cleaning of the data now this is how usable final data will look like..

No alt text provided for this image

Now this data can be utilized in order to run machine learning algorithms by splitting the data set into train and testing. Majority of the time of an ML data analyst goes into correcting the data set in order to make sure , the data gives clear picture without any problem.

In the next continued blog we will approach the following ML model to see which one gives more accurate results.

  1. Decision Tree Classifier
  2. Nearest Neighbor Classifier
  3. Naive Bayes Classifier
  4. Random Forest Classifier
  5. Logistic Regression
  6. Support Vector Classifier
  7. XG Boost Classifier

So Stay Tuned.......

要查看或添加评论,请登录

Prateek Sharma的更多文章

  • Be Aware of UPI Frauds...

    Be Aware of UPI Frauds...

    UPI(Unified Payment Interface)system in India is widely used to conduct digital transactions in seconds. Some facts's…

    1 条评论
  • Let's Talk About Digital Wallet|E-Commerce Frauds

    Let's Talk About Digital Wallet|E-Commerce Frauds

    We all are becoming more digitally active each passing day and the introduction of digital wallet systems opened up an…

    1 条评论
  • Mobile Money Fraud , The Next Big Thing

    Mobile Money Fraud , The Next Big Thing

    There was a time when we had to reach out to banks to deposit, transfer and withdraw our money , which was a big…

    10 条评论
  • Why we all should travel solo at least once.

    Why we all should travel solo at least once.

    Our life is all surrounded by people, we meet people all day long at work , at home , with friends , at party but…

  • Frauds in Telecom- Are you loosing money ?

    Frauds in Telecom- Are you loosing money ?

    As the recent scam detected in India's most famous PNB bank worth INR 11,000 crs( $1.77 bn) has opened varies eyes in…

    1 条评论
  • 5 Entrepreneur Lessons From PAD-MAN

    5 Entrepreneur Lessons From PAD-MAN

    Have you got a chance to watch PAD-MAN till now. Well it is a Biography on Tamil Nadu activist Arunachalam…

    1 条评论
  • The Power of Listening.

    The Power of Listening.

    Listening is just a 9 letter word but the power that this word have is magical. And the most important thing about this…

  • Does your mind think best while travelling..??

    Does your mind think best while travelling..??

    A very different question but very powerful one, Have you ever felt like you are getting some wonderful thoughts while…

  • The One second before taking a decision

    The One second before taking a decision

    Try to remember that one second where you have to take the perfect decision and that decision is your last shot at this…

  • “The Secret of Management technology”

    “The Secret of Management technology”

    Yeah finally the Secret is out and its called “The secret of getting at top”. Have you ever thought why countries ,such…

社区洞察

其他会员也浏览了