Machine learning in Consumer Fraud detection
Mohan Jayaraman
Partner at Bain & Company, Gen AI / AI expert with significant implementation experience across industries
It’s difficult to go through the news these days without a mention of fraud or financial crime popping up. Fraud on all counts, is a multi-billion dollar industry, (some reports indicate trillions) that is growing in proportion to the rapid digital process adoption across businesses. The increasing lucrativeness of the payoffs is turning fraud into an industry that is attracting human ingenuity and cutting-edge technology. Consumer industries are the most impacted, with an estimated 35% of the fraud losses globally (PWC global economic crime survey 2020) happening directly through consumer fraud, and perhaps even more indirect impact. Fraudsters are growing in sophistication and global digital pathways are increasing the vulnerabilities exponentially. Fortunately, the same technology that is resulting in this growth in fraud rates is also giving us new tools to fight them. Advances in machine learning have been particularly promising with their ability to use cheaper computing power effectively as well as their agility in learning new fraud types as they occur.
Traditional Consumer Fraud Management
Traditional fraud management techniques in the consumer space have largely depended on the use of analytical models to build fraud scores to show the likelihood that the transaction or credit application is fraudulent. This is done usually after deduplicating these applications or transactions against known fraud markers out of negative databases. Organizational loss appetites and customer service objectives drive a cutoff in these scores, to surface a specific number of these cases to human investigators, who finally approve or decline them. The transactions or customers that are subjected to these enhanced checks usually have a higher processing time so there is a customer service led disincentive to have too many of these cases escalated. On the other side of the scale are the losses the organization could potentially suffer should these transactions or applications go through. Sophistication in analytics in the past has driven up accuracy in fraud detection and brought down processing time. The industry processes still retain a high human touch and consequently have cost and time overheads. In some cases, organizations have evolved to auto decline high fraud probability cases to reduce the load on the manual referral system. This basic system has evolved over the years, but remains the bedrock of the way fraud is managed in the consumer space across the financial industry.
ML a quick introduction
Machine learning offers a fundamentally different way to look at and handle consumer fraud. Its worth taking a quick look at some basics of ML as we examine its application in this space. Wikipedia defines ML as “the study of computer algorithms that improve automatically through experience.” ML typically uses learning techniques to train an artificial neural network (ANN) using data sets. There are some basic ML learning models that can be applied depending on the type of problem intended to be solved.
Supervised learning refers to ML implementations where input information has already been labelled into the right classes (say good and bad) from external experience, as a part of the dataset used for modeling. The ML model than uses this input data to train the ANN to recognize similar patterns in new data. Unsupervised learning works typically on large data sets where classification or inferencing has not happened yet and ML implementation detects these patterns where it finds anomalies, that it then extrapolates this information into incoming data. Reinforcement learning (a la alphago fame) is a learning method where the ANN trains an agent within an environment based on feedback it receives and is typically of use in contained environments like games where the end state, i.e., win or lose, gives a clear feedback. There are other ways to look at learning paradigms in ML like semi-supervised, self-supervised, active learning etc., depending on the kind of problem, statistical inference or the learning technique (good read here).
Why is ML suited for Consumer fraud detection?
The fundamental reason why ML seems a natural fit in consumer fraud detection comes from its ability to work with large data sets to offer sharper classification over typical static statistical models and its ability to have a learning mode to handles changes in fraud patterns. The basic ability to classify better addresses the classical problem that fraud management systems have struggled with for a long period of time, i.e. of increasing fraud catch- rates while at the same time minimizing false positives, of good customers being declined or inconvenienced. Typical transitions from current statistical models to ML models in the consumer application fraud space have seen about a 15-18% increase in efficiency of catches while dropping false positive rates significantly, often by close to 50-60%. This is from the first few test cases that we have seen internally published within Experian and there is more positive data coming in by the day. These are significant improvements over existing models.
The second and more significant reason is, the ability of ML to use a virtuous cycle to improve itself based on incoming data. One of the advantages of current fraud management systems is that organizations record flow-through losses coming in from wrong decisions. This information can be used to create a feedback loop for the ML system to effectively improve itself and ‘learn’ about new types of fraud coming into the process. This feedback loop is invaluable in the consumer fraud management space as fraudsters are continually changing their modus operandi. The current manual system learns with a lag and has a natural disadvantage in the time gap between the new types of fraud coming in and making changes to models that will then catch them. ML uses continuous learning to improve itself potentially, and this learning cycle improves even further by using unsupervised learning to look at data anomalies.
Typical ML implementations in Consumer fraud detection
Consumer application fraud detection is the first area to be talked about. Application fraud refers to the losses that financial institutions incur on account of fraudulent applications by consumers. These can come from the extreme of non-existent consumer or man-of straw applications to mis-stated facts in credit applications that can impact decisions.
Given that the feedback loop here is well established, this area is best suited for Supervised learning. The data collection here would take all consumer demographic and behavior variables and classify the applications into good and bad based on experience. Given the increase in digital sourcing, introducing variables like device fingerprints will be useful at this stage. Knowing all the right kind of variables to use in the process takes some organizational learning, expert inputs and are a part of the ‘feature engineering’ phase in the build. The ANN is now trained with this data. Post typical tests on the resultant model, to check for model biases and fairness, the ML model is now ready for deployment. These tests can be largely automated and do not in any way take up time in the process. The learning loop is then typically set up in parallel. This loop uses incoming data to train a challenger model that the implementation monitors to identify the point when the lift between the champion and challenger is significant enough to warrant action. Again, pre-graduating the challenger into production, the base tests are run. The existing organizational manual referral processes can be used in this set-up. Similarly, existing organizational negative databases can feed into the process as usual.
Post-acquisition fraud management of consumers is a slightly different game. This refers to all consumer interactions that happen post their being onboarded into the organization. These could be payments, remittances or withdrawal where the product they own is a savings / check-in account or a credit card. This could be additional products that are sold to customer and requests for increases / decreases in limits.
The transaction space is by far the most complex, as it includes external networks, merchants etc. in its scope. This is an area where ML can have the most impact. ML algos recognize patterns in the data that allow them to differentiate fraudsters from legitimate consumers, based on a lot of information, that sometimes may seem completely unrelated to a human being. They can use details like device information, typical behavior pattern of consumers together with the context of usage to detect anomalous transactions. Typically for data sets of this size, unsupervised learning has been seen to be very useful. Semi-supervised learning where the training data contains very few labeled examples with outcomes and a large number of unlabeled examples is found to be especially useful in this space.
The next steps
ML in fraud management has been in use for a while, but we have started seeing some great results come through, both in the customer acquisition and the post-acquisition phase, by mainstream use of these methodologies. Given the increase in digital focus in the post COVID-19 period we are certainly going to see an increase in fraud rates and moving to use the best of tools through this time seems the most prudent thing to do. For organisations looking at using ML, the first steps would be to review the existing processes and decide where they want to create maximum impact. Once this is decided, getting the right data and the ML modelers into the mix would be the next step. Making the best use of the feedback loop to improve the prediction requires a clear design choice that is best made upfront.
In addition to all the current possibilities, there are some exciting new use cases that we can look forward to in the space, in the coming period. One of the opportunities for fraud detection is that this is an area where organisations can collaborate to take on the fraudsters. In the past this has been difficult given privacy and competitive concerns. Given developments in the ML space, like federated learning and transfer learning, where this knowledge can now be shared across ANNs without concerns of actual customer data sharing, there are some interesting collaboration possibilities between organisations opening up. There are also very interesting possibilities of replacing the referral or case management systems, in part, by ML. In all, ML as a fraud management tool is here to stay and organisations will do well to add it to their repertoire to skills to tackle the growing menace of fraud.
About the author
In my role as the Regional MD for Decision Analytics and Business Information - APac, my team and I work in partnership with clients to deliver responsible innovation, advanced analytics solutions and global Experian products and solutions into the APac market. Our team runs the APac innovation hub, our X-Labs, which builds new products and solutions for APac. In addition, our analytics COEs across the region harness cutting edge analytics and AI to solve consumer problems. Please feel free to contact me at [email protected] for details.
Head of EMEA at Mapbox
4 年Aerospike allows more accurate decisions in real-time.? The greater the accuracy, the greater the business objectives are impacted. Artificial intelligence and machine learning have the potential to unlock great value for customers in the payments space (Aerospike powers TIPS/Zelle and others). Srini Srinivasan, Chief Product Officer at Aerospike explains where AI and ML fit into the current payments landscape.??https://www.dhirubhai.net/posts/geoffclark_httpslnkdindj-5gb7-activity-6674615514729324544-CapP
Building DGV "India's First Integrated Dairy Fintech, Insuretech and Marketplace Platform"
4 年Very informative