登录查看更多内容

AI/ML : Customers Behavior Analysis To Increase Subscriptions for Financial App...From Scratch [Part 1]

Prateek Sharma

Manager Risk & Fraud BMS || Revenue Assurance Consultant || Thinker | Solving Problems with AI ML || Ex- Airtel || Ex- Subexian || eMDP IIMK

发布日期: 2020年12月4日

AI/ML is without a doubt hot topic of the town and it definitely make sense as soon as we start to understand the capabilities of this in detail. The analysis can create wonders for business owners to predict the exact values which is far from our thinking capabilities. Here would like to explain one of the common problems considering financial application data and predict the customer who is going to subscriber a premium version app subscription or not. Accordingly the company should take action on the customers to give the offers or not. The data contain the customer’s behavior and our job to find the insights from it.

So lets begin...

Problem Statement :- Fintech company has launched an application which can be used in multiple purpose such as loan, savings, payments etc.. The application has two versions(Free and Paid). The goal of the company is to sell the paid version to the target market to save the extra cost of marketing. That’s a reason they are provided the premium feature in the free version app for 24 hours to collect the customer’s behavior. After that, the company hired the Machine Learning Engineer to find insight from the collected data (customer’s behavior).

If the customers will buy a product anyway so no need to give an offer to that customer Only give offers to those customers who are interested to use paid version app but they can’t afford its cost.

We will use python in order to perform the analysis with Jupiter notebook.

Get your tools ready :- So let's import the essential libraries

import numpy as np                # for numeric calculation
import pandas as pd                # for data analysis and manipulation
import matplotlib.pyplot as plt       # for data visualization
import seaborn as sns                    # for data visualization
from dateutil import parser                # convert time in date time data type

2. Lets explore the data :-

fineTech_appData = pd.read_csv("FineTech_appData.csv")  # Load the data
fineTech_appData.shape                               # get shape of dataset

Output :- (50000, 12)

So the dataset holds 50000 rows and 12 columns, lets see how it looks like

fineTech_appData.head(5)   # show fisrt 5 rows of fineTech_appData DataFrame

The 6th column’s (screen_list) full information is not visible, so lets see what this column have in detail.

for i in [1,2,3,4,5]:
    print(fineTech_appData.loc[i,'screen_list'],'\n')  

#We print only 5 rows from index 1 to 5 from the screen_list

Great...Now lets not forget the first rule of data validation. Let's identify the values with null.

fineTech_appData.isnull().sum() # To check the null values
fineTech_appData.info()         # To get the information about the dataset

So we can see the enrolled_date has 18926 null values and rest all are okay.

fineTech_appData.describe()  # To see the distribution

Okay now lets only take the important columns and remove the rest.

fineTech_appData2 = fineTech_appData.drop(['user', 'first_open', 'screen_list', 'enrolled_date'], axis = 1)             #To Drop non usable columns

Ohh yeah one more thing before we forget lets quickly change the data type of the hour column ( From 02:00:00 > 2 ) makes more sense for data analysis.

fineTech_appData['hour'] = fineTech_appData.hour.str.slice(1,3).astype(int)                   #changing the data time to integer

Hmmm looks better now...So time see some visualization as mind understand it better.

3. Data visualization :-

1. Heatmap with correlation matrix :-

plt.figure(figsize=(14,6))  # Plot figure with given size
sns.heatmap(fineTech_appData2.corr(), annot = True, cmap ='coolwarm')  # Correlation with

annot – an array of same shape as data which is used to annotate the heatmap

cmap – a matplotlib colormap name or object. This maps the data values to the color space.

n the fineTech_appData2 dataset, there is no strong correlation between any features. There is little correlation between ‘numscreens’ and ‘enrolled’. It means that those customers saw more screen they are taking premium app.

2. Count Plot :-

sns.countplot(fineTech_appData.enrolled)

This suggests more enrolled users compared to the not enrolled users in our data set.

3. Histogram of each feature:-

plt.figure(figsize = (14,7))
features = fineTech_appData2.columns # list of columns name
for i,j in enumerate(features)
plt.subplot(3,3,i+1)                               # create subplot for histogram
plt.title("Histogram of {}".format(j), fontsize = 12)    # title of histogram
bins = len(fineTech_appData2[j].unique())                   # bins for histogram
plt.hist(fineTech_appData2[j], bins = bins, rwidth = 0.8, edgecolor = "y", linewidth = 2, )                                                # plot histogram
plt.subplots_adjust(hspace=0.5)        # space between horixontal axes (subplots)

Histogram explains minigame, used_primium_feature, enrolled, and like they have only two values.

The histogram of ‘dayofweek’ shows, on Tuesday and Wednesday slightly fewer customer registered the app.

The histogram of ‘hour’ shows the less customer register on the app around 10 AM.

The ‘age’ histogram shows, the maximum customers are younger.

The ‘numsreens’ histogram shows the few customers saw more than 40 screens.

Now lets find out how much time a customer takes to get enrolled in premium feature app after registration, for that we need first open date and enrolled date(Which we dropped earlier).

Parsing first open date and enrolled date in date and time format for easy subtraction.

fineTech_appData['first_open'] =[parser.parse(i) for i in fineTech_appData['first_open']]            # Creating a list
fineTech_appData['enrolled_date'] =[parser.parse(i) if isinstance(i, str) else i for i in fineTech_appData['enrolled_date']]

Now Subtract and plot to see the time..

fineTech_appData['time_to_enrolled']  = (fineTech_appData.enrolled_date - fineTech_appData.first_open).astype('timedelta64[h]')
plt.hist(fineTech_appData['time_to_enrolled'].dropna(), range = (0,100))

So this shows 0 to 10 hours from the time of registration.

We have got enough information for our analysis but we missed one important information which would be very helpful here is "Screen List". This will help us to identify what is the user behavior and how much time he/she is spending on the screen.

We have picked all distinct screen list in another CSV file and will append and make columns of these screen list values and then check this screen name is available in ‘screen_list’ if it is available then add value 1 else 0 in the appended column.

# string into to number

for screen_name in fineTech_app_screen_data:
fineTech_appData[screen_name] = fineTech_appData.screen_list.str.contains(screen_name).astype(int)
fineTech_appData['screen_list'] = fineTech_appData.screen_list.str.replace(screen_name+",", "")
fineTech_appData.shape

Output 

(50000, 68)

So after all the cleaning of the data now this is how usable final data will look like..

Now this data can be utilized in order to run machine learning algorithms by splitting the data set into train and testing. Majority of the time of an ML data analyst goes into correcting the data set in order to make sure , the data gives clear picture without any problem.

In the next continued blog we will approach the following ML model to see which one gives more accurate results.

Decision Tree Classifier
Nearest Neighbor Classifier
Naive Bayes Classifier
Random Forest Classifier
Logistic Regression
Support Vector Classifier
XG Boost Classifier

So Stay Tuned.......

要查看或添加评论，请登录

Prateek Sharma的更多文章

Be Aware of UPI Frauds...

2024年2月10日

Be Aware of UPI Frauds...

UPI(Unified Payment Interface)system in India is widely used to conduct digital transactions in seconds. Some facts's…

1 条评论
Let's Talk About Digital Wallet|E-Commerce Frauds

2020年4月18日

Let's Talk About Digital Wallet|E-Commerce Frauds

We all are becoming more digitally active each passing day and the introduction of digital wallet systems opened up an…

1 条评论
Mobile Money Fraud , The Next Big Thing

2018年8月25日

Mobile Money Fraud , The Next Big Thing

There was a time when we had to reach out to banks to deposit, transfer and withdraw our money , which was a big…

10 条评论
Why we all should travel solo at least once.

2018年5月5日

Why we all should travel solo at least once.

Our life is all surrounded by people, we meet people all day long at work , at home , with friends , at party but…
Frauds in Telecom- Are you loosing money ?

2018年2月25日

Frauds in Telecom- Are you loosing money ?

As the recent scam detected in India's most famous PNB bank worth INR 11,000 crs( $1.77 bn) has opened varies eyes in…

1 条评论
5 Entrepreneur Lessons From PAD-MAN

2018年2月11日

5 Entrepreneur Lessons From PAD-MAN

Have you got a chance to watch PAD-MAN till now. Well it is a Biography on Tamil Nadu activist Arunachalam…

1 条评论
The Power of Listening.

2016年9月14日

The Power of Listening.

Listening is just a 9 letter word but the power that this word have is magical. And the most important thing about this…
Does your mind think best while travelling..??

2016年9月9日

Does your mind think best while travelling..??

A very different question but very powerful one, Have you ever felt like you are getting some wonderful thoughts while…
The One second before taking a decision

2016年6月22日

The One second before taking a decision

Try to remember that one second where you have to take the perfect decision and that decision is your last shot at this…
“The Secret of Management technology”

2016年6月7日

“The Secret of Management technology”

Yeah finally the Secret is out and its called “The secret of getting at top”. Have you ever thought why countries ,such…

See all articles

AI/ML : Customers Behavior Analysis To Increase Subscriptions for Financial App...From Scratch [Part 1]

Prateek Sharma

Manager Risk & Fraud BMS || Revenue Assurance Consultant || Thinker | Solving Problems with AI ML || Ex- Airtel || Ex- Subexian || eMDP IIMK

3. Data visualization :-

Prateek Sharma的更多文章

社区洞察

其他会员也浏览了

Web Scraping Meets Data Science: Unlocking Business Value Through Automated Data Collection

ML Pipelines for Model Tuning

Build an Advanced RAG App: Query Routing

Mastering Algorithms: From Real-World Challenges to Efficient Solutions ??

Master THE FIVE SORTING ALGORITHMS in 5 Minutes A Day

How to detect drift with Evidently and MLFlow

Building an LLM-Driven Stock Price Forecast (Prediction) with the S&P 500 Public Dataset

Optimizing Range Sum Queries - using Prefix Sums

FEATURE SCALING

ml_project_important_aspects

3. Data visualization :-

Prateek Sharma的更多文章

Be Aware of UPI Frauds...

Let's Talk About Digital Wallet|E-Commerce Frauds

Mobile Money Fraud , The Next Big Thing

Why we all should travel solo at least once.

Frauds in Telecom- Are you loosing money ?

5 Entrepreneur Lessons From PAD-MAN

The Power of Listening.

Does your mind think best while travelling..??

The One second before taking a decision

“The Secret of Management technology”

社区洞察

其他会员也浏览了

Web Scraping Meets Data Science: Unlocking Business Value Through Automated Data Collection

ML Pipelines for Model Tuning

Build an Advanced RAG App: Query Routing

Mastering Algorithms: From Real-World Challenges to Efficient Solutions ??

Master THE FIVE SORTING ALGORITHMS in 5 Minutes A Day

How to detect drift with Evidently and MLFlow

Building an LLM-Driven Stock Price Forecast (Prediction) with the S&P 500 Public Dataset

Optimizing Range Sum Queries - using Prefix Sums

FEATURE SCALING

ml_project_important_aspects