登录查看更多内容

Discover or Learn Patterns?

Aditya Khandekar, CFA

Chief Revenue Officer I Analytics & Strategy Leader I 3AI Thought Leader I Fintech Enthusiast

发布日期: 2018年3月5日

Just think about it for a sec.... the question is non-trivial!

As a business leader who wants to leverage analytics in a new initiative, this question always come up bluntly or subtly. The dilemma here is:

1. Do I have enough scenarios of patterns already identified from my business knowledge or processes which I can amplify using machine learning? OR

2. Do I have a broad definition of the problem (which is real) but very few scenarios concretely available to drive analytical solutions?

In case of “1”, supervised methods are applicable, in case of “2” you will need to use unsupervised methods to understand patterns and develop decisioning based on it.

So what’s the big deal? Let’s try and explain this with an example. Let’s say I am looking for fraud patterns from traditional channel (in-store purchases) and digital channels (like mobile) for a retailer.

Discovering versus Learn Patterns for Digital Fraud

Let’s say the retailer recently (6 months) launched the digital channel for selling pet food. The retailer doesn't have a lot of experience around fraud in this channel.

Issues

a. Since digital fraud scenarios are new I might use clustering or outlier detection techniques to understand patterns of customer purchases which might be considered outliers. I might also use time series event modelling like Markov Chains or Recurrent Neural Nets to understand customer behavior temporally to see anomalous behavior. The issue is I don’t know if outliers identified are really outliers?

b. The analytics team then needs to go back to the business SME’s (domain experts) and ask them to manually verify and “tag” these outliers for them, which is an unexpected additional burden especially if the volumes to analyze are large. Why is this important?

c. The reason tagging becomes important is that business is nervous to put such systems into production where the risk of False Positive is high and its adverse impact on customer experience.

d. Essentially the business & analytics teams are flying blind and have to make a “leap of faith” that some mouse-trap is better than none! The unsupervised approach then needs constant refinement and re-learning based on fraud data being collected post deployment to make it more effective in capturing fraud universe (sensitivity of model) and quality of detection (precision of the model)

Resolution

a. Challenge the business team and the analytics team to see if you can break the problem down into a series to narrow footprint analytical problems for which you have a reasonable understanding of fraud behavior (even if it is by proxy). You might not catch all the fraud, but its better to machine learn from patterns in existing data versus trying to discover them. In our case example, there might be some cross-over fraud patterns from the in-store world (like payment fraud or item return fraud) which might be applicable to digital channels. Build supervised models to capture this behavior and get immediate business impact. Manage false positive carefully through descriptive analysis of non-fraud and build business rules which overlay on top of model scores to reduce False Positives with minimal impact to fraud detection.

b. Go out and collect data from controlled experiments and observe/analyze fraud behavior. Yes that means you might need to wait for 3-4 months till some patterns start to emerge, but that might help create a better mouse-trap downstream.

c. See if you can purchase external data at point of digital purchase (for example ID Vision from TransUnion provides a device risk score) to augment your feature set for prediction.

Broadly speaking I see the unsupervised approach as being "transient" in nature, you will eventually migrate to a supervised approach once you have sufficient data which is tagged and you understand fraud patterns well. We have also built semi-supervised models which sequence clustering with supervised models to drive higher detection rate and lower False Positives.

At Scienaptic we are working closely with clients and helping them navigate such issue for delivering real business impact.

Appreciate your feedback/comments and how you are dealing with such issues in your analytical journeys?

Joe Burns

PI Data & Analytics at Travelers

7 年

Really insightful and relevant post Aditya. We were just discussing the challenges of this with a team the other day and while third-party data sets are useful, it still adds even more data into an already complex situation. I would add too that some business teams are somewhat frightened by knowing what they don't know - i.e., will the discovery you're proposing have an adverse impact on my results in the future? Am I signing up for something that may well paint me into a corner? Regardless, I like the path you're proposing and see many applications for that direction. As always, great insights and love to hear more.

1 次回应

Blake Arnold

Strategy & Analytics at Faire

7 年

Great post, Aditya. I like your description of an unsupervised -> data annotation -> supervised journey. Lack of labeled data combined with vast quantities of data continues to be a challenge in many contexts. I see Active Learning as a promising area of innovation to reduce human time required to comb through and annotate cases, many of which will be FPs https://drive.google.com/file/d/1Mx45sFHG5cOPMHmEF6u_CmxVG-_7TPKe/view

Pradeep Chaturvedi

Managing Director at JPMorgan Chase & Co., Wholesale Payments

7 年

Aditya - Great question. In my experience you have to use both - supervised and unsupervised. I agree with you that supervised models and known patterns are easier to implement and easily accepted within the organization. And businesses/operations teams are quite reluctant to add overhead from the unsupervised but sometimes all it takes is one event to bring broad change in the organization and that's when you present data analyzed by your unsupervised models and get them accepted as the new normal. Pradeep.

3 次回应

查看更多评论

要查看或添加评论，请登录

Aditya Khandekar, CFA的更多文章

Impact of Adverse Selection on Digital Lending: Heads-up to Community FI's

2023年10月11日

Impact of Adverse Selection on Digital Lending: Heads-up to Community FI's

Introduction Adverse selection poses a significant hurdle for Community Financial Institutions (FIs) as they embark on…

7 条评论
Left Shift Decisioning, a key differentiator for Banks to be nimble

2023年3月20日

Left Shift Decisioning, a key differentiator for Banks to be nimble

Left Shift – A software development concept. Concept: The core concept of left shifting started from testing, where we…

1 条评论
Model Management in Banking during the Crisis: A Platform driven approach

2020年9月18日

Model Management in Banking during the Crisis: A Platform driven approach

Banks make decisions in prospecting (acquiring customers), underwriting (lending to consumers and businesses) and…

3 条评论
Build powerful NLP engines – A case for Account Takeover fraud

2018年8月15日

Build powerful NLP engines – A case for Account Takeover fraud

Its been a busy summer work-wise, we have been cooking up (literally!) AI driven solutions for credit underwriting and…

1 条评论
Learning Fraud Anomalies, a Semi-Supervised Mouse Trap!

2017年9月6日

Learning Fraud Anomalies, a Semi-Supervised Mouse Trap!

While we were happily enjoying the Labor Day weekend, fraudsters are busy at work…I had a card breach this weekend for…

1 条评论
Show me the (money)...Value from Analytics!

2016年12月3日

Show me the (money)...Value from Analytics!

A short note on a few common threads I am seeing in client interactions for analytics value props: a. Clean my Data:…

4 条评论
Walking the Fine Line between False Positives and False Negatives in the Digital Marketing World

2016年3月13日

Walking the Fine Line between False Positives and False Negatives in the Digital Marketing World

Ok, yes the title sounds geeky, but what did you expect from an analytics guy? Let me set context for this. We have…

7 条评论
Analytics is a sharp knife, but without a strategic blueprint you might hurt someone

2015年12月13日

Analytics is a sharp knife, but without a strategic blueprint you might hurt someone

A business problem framing should lead to an analytics problem framing and then you create this killer analytics…

3 条评论
Experience in using Visualization Tools for Analytics

2014年8月4日

Experience in using Visualization Tools for Analytics

I have been fortunate in having experienced multiple data visualization tools for data discovery and interpretation in…

1 条评论
Are you a Good Analytics Leader?

2014年7月11日

Are you a Good Analytics Leader?

A lot has been written on strong leadership qualities in general management. But with Analytics becoming mainstream…

2 条评论

See all articles

Discover or Learn Patterns?

Aditya Khandekar, CFA

Chief Revenue Officer I Analytics & Strategy Leader I 3AI Thought Leader I Fintech Enthusiast

Aditya Khandekar, CFA的更多文章

社区洞察

其他会员也浏览了

Machine Learning for Credit Card Fraud Detection

AI Risk 101: The Risk Calculation of Machine Learning Systems.

Data vs. AI: What Do Organizations Really Want From IT Solutions?

Anomalies vs. Outliers: Distinguishing the Unexpected in Data

?? Unlocking the Power of Na?ve Bayes in UIDAI Aadhaar! ??

AI-Driven Fraud Detection: How Machine Learning Safeguards Businesses in Real-Time

Magic and Data Science

Machine Learning Models will automatically learn behaviour

Evolution of Binary Classification Models: A Real-World Journey from Perceptron to Logistic Regression

Why Gradient Boosting Machines (GBM) is Effective for Fraud Detection

Aditya Khandekar, CFA的更多文章

Impact of Adverse Selection on Digital Lending: Heads-up to Community FI's

Left Shift Decisioning, a key differentiator for Banks to be nimble

Model Management in Banking during the Crisis: A Platform driven approach

Build powerful NLP engines – A case for Account Takeover fraud

Learning Fraud Anomalies, a Semi-Supervised Mouse Trap!

Show me the (money)...Value from Analytics!

Walking the Fine Line between False Positives and False Negatives in the Digital Marketing World

Analytics is a sharp knife, but without a strategic blueprint you might hurt someone

Experience in using Visualization Tools for Analytics

Are you a Good Analytics Leader?

社区洞察

其他会员也浏览了

Machine Learning for Credit Card Fraud Detection

AI Risk 101: The Risk Calculation of Machine Learning Systems.

Data vs. AI: What Do Organizations Really Want From IT Solutions?

Anomalies vs. Outliers: Distinguishing the Unexpected in Data

?? Unlocking the Power of Na?ve Bayes in UIDAI Aadhaar! ??

AI-Driven Fraud Detection: How Machine Learning Safeguards Businesses in Real-Time

Magic and Data Science

Machine Learning Models will automatically learn behaviour

Evolution of Binary Classification Models: A Real-World Journey from Perceptron to Logistic Regression

Why Gradient Boosting Machines (GBM) is Effective for Fraud Detection