Advances in the Fintech industry by efficient use of "Data"?- Part 2

Advances in the Fintech industry by efficient use of "Data"- Part 2

I started my most challenging journey as Data science professional with PayNearby sometime in 2018 and this was when I got extensive exposure to the BFSI business.

PayNearby operates on a B2B2C model, where they partner with neighborhood retail stores and enable them with the tools to provide assisted financial and digital commerce services to their local communities across 17,600 plus PIN codes. These local communities can access PayNearby Partner stores to avail a range of services including cash withdrawal, cash deposits, money transfer, savings, insurance, travel, Digital Payments, access to government benefits, and many more.

Below are a few stats to give you all a feel of the volumes of data I got to manage as their Data Science head :)

No alt text provided for this image

In this newsletter, I will be covering the most crucial project which was setting up the entire data science practice for India’s leading branchless financial service provider to the people at the bottom of the pyramid

I will briefly cover the Data Platform and the different components

Architecture Diagram

No alt text provided for this image

At Pay nearby, we were centralizing data from multiple data sources into the Cloud data warehouse which was on AWS & GCP

AWS was mostly used where we needed real-time data usage or we were supposed to integrate some of our data models to be used in real-time since the development team was on AWS. Whereas GCP was used for all other Analytics & data-science use cases.

There were two reasons why we opted for this structure

  1. GCP's BQ is an economical option for data analytics and has very low latency while querying
  2. BQ is very user-friendly that even the Fin-Ops guys started using it for their data requirement and our goal to make data accessible to everyone to work better and more efficient was easily achieved

PS- Redshift was way too expensive though had lower latency but I believe it's good to be frugal when you see you have long-term benefits. Athena on the other hand was not something that we could roll out to everyone across the org to use as we did with BigQuery.

Let's cover each block of the Data platform briefly :

Source:

There were more than 200 data sources! So, you can imagine the challenge to put all the data in one place and create a single source of records which is just the first step toward building a full-fledged data science stack.

Data source/types - MySQL, MS-SQL, Postgress, Operational file systems, HR system, Opensource data, Image files and etc

ETL Pipelines:

At Nearby, we were ingesting?~ 200 Million?transactional entries in a month with over?200 active pipelines!

Data security?was at the core of Nearby technologies and hence were had to create our own ETL platform?on GCP. This not only helped us to?secure data?but also helped us in?saving big time, which could have been a huge cost if we would have gone for any third-party ETL tool.

Data warehouse and Data Lake:

We use two types of Data Warehouses to solve different use cases.

  1. Google's BigQuery - This is a columnar data warehouse tool and is based on pay per scan costing model
  2. Cloud storage - Here we store cold data and keep cleaning this frequently

Important pointers, you should take into consideration while using Bigquery

  • Avoid using?Select *?instead, specify the column you need this method will help you reduce cost drastically
  • Use the data partitioning cluster method while storing data, this helps in lowering query costs even if you have to scan the heaviest table
  • Keep cold data in a separate table as active and cold data are priced differently in GCP

Data Visualisation, Dashboards & Reporting?:

We used?PowerBI?for dashboarding, visualization, and reporting. This is the one-stop source for all the product and business metrics and it was extensively used across the organization to track product performance and business health. This enabled the Senior management and CXOs to be aware of all the key metrics and also gives them a UI from where based on the filters they could extract metadata.

We also used Automailers to send across many reports direct to the mailbox of the stakeholders using R/python

Machine Learning?:

We had a dedicated team of Data scientists who worked together to make robust systems that help to achieve our business goals easily

Here are a few systems built by our data scientists:

  • Product cross-sell?system based on product usage, location, and line of business
  • In-house PAN and Adhaar verification system for user Identity verification ?- With this system, we were able to validate data directly from images thereby reducing customer onboarding time and improving customer experience.
  • Churn prediction model?-we were able to predict who all will leave the system in the next 15 days thereby making retention strategy well in advance which in return helped us to achieve?~85%-90% second-month?retention of the agents
  • Price optimization
  • Market estimation?using?the stacked ML model?helped us in understanding the new area where the market can be created which helped the business to strategize their sales functions
  • Anomaly detection to flag fraudulent transactions before its processed
  • Sales gamification:?The use of characteristic game elements in typically non-game contexts to create gameful experiences that enhance value creation by users. We created a gameful experience for our customers which in turn will increase user engagement. We named it "Leaderboard", it's a platform where we are allowing our customers to earn more and a platform that gives them an adrenaline rush to get to the top in the locality by instilling a feeling of competition.

Business Analytics?:

"One size fits all" is something that does not apply to any Data science team as in most cases a good machine learning guy might not be good at quick business analysis but would be a great fit to solve complicated business problems which can not be solved by traditional methods. Keeping this in mind we had a team of business analytics ninjas who are quick like ninjas and helped us to be on our toes always.

Here are a few items built by our BA ninjas:

  • Entire PowerBI dashboarding for each service (We had ~ 35 dashboards, each tracking ~50 metrics of product/business customized for each LOB )
  • Data governance - to ensure the data is in the right format to enable users the get it at any time
  • They are the custodians of all the reports delivered to the mailboxes of all the stakeholders on time and in the right format
  • Enablers for marketing campaigns because they provide the data and insights for the campaigns and then measure them too!
  • Helps business/products to give visibility of the business impact of any change before actually making any strategy/feature change

No alt text provided for this image


Data blog post:https://www.dqindia.com/digging-new-oil-well-data

Follow me on LinkedIn: www.dhirubhai.net/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=geetanjali-prasad

Upendra Sangam

Delivery leader heading enterprise analytics projects | Program management | Account Management | Data Architecture | Data Visualization | Data Governance

2 年

Thanks for sharing, insightful read... Curious to know how was data org structure setup? Was it centralized or dedicated data team for each department?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了