登录查看更多内容

Advances in the Fintech industry by efficient use of "Data"- Part 2

Geetanjali Prasad ????

Analytics & Strategy Leader | Transforming Businesses with Data-Driven Insights | Coached 200+Aspiring Data Scientists

发布日期: 2022年8月26日

I started my most challenging journey as Data science professional with PayNearby sometime in 2018 and this was when I got extensive exposure to the BFSI business.

PayNearby operates on a B2B2C model, where they partner with neighborhood retail stores and enable them with the tools to provide assisted financial and digital commerce services to their local communities across 17,600 plus PIN codes. These local communities can access PayNearby Partner stores to avail a range of services including cash withdrawal, cash deposits, money transfer, savings, insurance, travel, Digital Payments, access to government benefits, and many more.

Below are a few stats to give you all a feel of the volumes of data I got to manage as their Data Science head :)

In this newsletter, I will be covering the most crucial project which was setting up the entire data science practice for India’s leading branchless financial service provider to the people at the bottom of the pyramid

I will briefly cover the Data Platform and the different components

Architecture Diagram

At Pay nearby, we were centralizing data from multiple data sources into the Cloud data warehouse which was on AWS & GCP

AWS was mostly used where we needed real-time data usage or we were supposed to integrate some of our data models to be used in real-time since the development team was on AWS. Whereas GCP was used for all other Analytics & data-science use cases.

There were two reasons why we opted for this structure

GCP's BQ is an economical option for data analytics and has very low latency while querying
BQ is very user-friendly that even the Fin-Ops guys started using it for their data requirement and our goal to make data accessible to everyone to work better and more efficient was easily achieved

PS- Redshift was way too expensive though had lower latency but I believe it's good to be frugal when you see you have long-term benefits. Athena on the other hand was not something that we could roll out to everyone across the org to use as we did with BigQuery.

Let's cover each block of the Data platform briefly :

Source:

There were more than 200 data sources! So, you can imagine the challenge to put all the data in one place and create a single source of records which is just the first step toward building a full-fledged data science stack.

Data source/types - MySQL, MS-SQL, Postgress, Operational file systems, HR system, Opensource data, Image files and etc

ETL Pipelines:

At Nearby, we were ingesting?~ 200 Million?transactional entries in a month with over?200 active pipelines!

Data security?was at the core of Nearby technologies and hence were had to create our own ETL platform?on GCP. This not only helped us to?secure data?but also helped us in?saving big time, which could have been a huge cost if we would have gone for any third-party ETL tool.

Data & Analytics 1 个月前

9 Predictions for Data in 2023

Tomasz Tunguz 2 年前

A deep dive: What is LSM tree?

Vivek Bansal 3 个月前

Data warehouse and Data Lake:

We use two types of Data Warehouses to solve different use cases.

Google's BigQuery - This is a columnar data warehouse tool and is based on pay per scan costing model
Cloud storage - Here we store cold data and keep cleaning this frequently

Important pointers, you should take into consideration while using Bigquery

Avoid using?Select *?instead, specify the column you need this method will help you reduce cost drastically
Use the data partitioning cluster method while storing data, this helps in lowering query costs even if you have to scan the heaviest table
Keep cold data in a separate table as active and cold data are priced differently in GCP

Data Visualisation, Dashboards & Reporting?:

We used?PowerBI?for dashboarding, visualization, and reporting. This is the one-stop source for all the product and business metrics and it was extensively used across the organization to track product performance and business health. This enabled the Senior management and CXOs to be aware of all the key metrics and also gives them a UI from where based on the filters they could extract metadata.

We also used Automailers to send across many reports direct to the mailbox of the stakeholders using R/python

Machine Learning?:

We had a dedicated team of Data scientists who worked together to make robust systems that help to achieve our business goals easily

Here are a few systems built by our data scientists:

Product cross-sell?system based on product usage, location, and line of business
In-house PAN and Adhaar verification system for user Identity verification ?- With this system, we were able to validate data directly from images thereby reducing customer onboarding time and improving customer experience.
Churn prediction model?-we were able to predict who all will leave the system in the next 15 days thereby making retention strategy well in advance which in return helped us to achieve?~85%-90% second-month?retention of the agents
Price optimization
Market estimation?using?the stacked ML model?helped us in understanding the new area where the market can be created which helped the business to strategize their sales functions
Anomaly detection to flag fraudulent transactions before its processed
Sales gamification:?The use of characteristic game elements in typically non-game contexts to create gameful experiences that enhance value creation by users. We created a gameful experience for our customers which in turn will increase user engagement. We named it "Leaderboard", it's a platform where we are allowing our customers to earn more and a platform that gives them an adrenaline rush to get to the top in the locality by instilling a feeling of competition.

Business Analytics?:

"One size fits all" is something that does not apply to any Data science team as in most cases a good machine learning guy might not be good at quick business analysis but would be a great fit to solve complicated business problems which can not be solved by traditional methods. Keeping this in mind we had a team of business analytics ninjas who are quick like ninjas and helped us to be on our toes always.

Here are a few items built by our BA ninjas:

Entire PowerBI dashboarding for each service (We had ~ 35 dashboards, each tracking ~50 metrics of product/business customized for each LOB )
Data governance - to ensure the data is in the right format to enable users the get it at any time
They are the custodians of all the reports delivered to the mailboxes of all the stakeholders on time and in the right format
Enablers for marketing campaigns because they provide the data and insights for the campaigns and then measure them too!
Helps business/products to give visibility of the business impact of any change before actually making any strategy/feature change

Data blog post:https://www.dqindia.com/digging-new-oil-well-data

Follow me on LinkedIn: www.dhirubhai.net/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=geetanjali-prasad

Upendra Sangam

2 年

Thanks for sharing, insightful read... Curious to know how was data org structure setup? Was it centralized or dedicated data team for each department?

1 次回应

查看更多评论

Advances in the Fintech industry by efficient use of "Data"- Part 2

Geetanjali Prasad ????

Analytics & Strategy Leader | Transforming Businesses with Data-Driven Insights | Coached 200+Aspiring Data Scientists

Architecture Diagram

Source:

ETL Pipelines:

领英推荐

Data warehouse and Data Lake:

Data Visualisation, Dashboards & Reporting?:

Machine Learning?:

Business Analytics?:

DataSense

1,094 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Databases Deconstructed: The Value of Data Lakehouses and Table Formats

The Five Important Trends in Data, and the One Megatrend Powering Them All

Common HiveQL to BigQuery Migration Errors: A Detailed Exploration Part 1

Microsoft Fabric Data Warehouse - The Polaris engine

8 Data Structures Powering Modern Databases-Scaler

Advanced Filtering Techniques With DynamoDB

Is Big Data dead?

Efficient Data Modelling In DynamoDB

Big Data Testing

Three V's of Big Data

Architecture Diagram

Source:

ETL Pipelines:

领英推荐

Data warehouse and Data Lake:

Data Visualisation, Dashboards & Reporting?:

Machine Learning?:

Business Analytics?:

DataSense

1,094 位关注者

Advances in the Fintech industry by efficient use of?"Data"- Part 1

2022年8月16日

Data-driven Marketing — Evolution of pamphlets!

2022年8月11日

How Data Analytics can transform business - "Turning data into dollar$"

2021年8月30日

Pay Nearby's Data science Stack

2020年4月8日

社区洞察

其他会员也浏览了

Databases Deconstructed: The Value of Data Lakehouses and Table Formats

The Five Important Trends in Data, and the One Megatrend Powering Them All

Common HiveQL to BigQuery Migration Errors: A Detailed Exploration Part 1

Microsoft Fabric Data Warehouse - The Polaris engine

8 Data Structures Powering Modern Databases-Scaler

Advanced Filtering Techniques With DynamoDB

Is Big Data dead?

Efficient Data Modelling In DynamoDB

Big Data Testing

Three V's of Big Data