The Dream of the 90s: FICO Delivers XAI with Fraud Detection Models
In 1992, FICO introduced FICO? Falcon, a neural network-based fraud detection system that detects fraudulent payment card transactions in real-time. Since then, despite the proliferation of fraud types, there has been a dramatic reduction in payment card fraud across the globe.
One of the reasons we developed scoring technology was to help analysts take action on the transactions most likely to be fraudulent. We introduced the Reason Reporter, which, as the name suggests, provide reasons associated with the neural network scores Falcon produces. This capability is, in fact, Explainable Artificial Intelligence (XAI), a topic that recently has become a hot one in light of the European Union’s General Data Protection Regulation (GDPR) and society’s increasing reliance on AI systems.
In 1996 FICO filed a patent for Reason Reporter—indicative of how long, in fact, FICO has been working with XAI. In the context of Falcon, this algorithm is designed to provide explanations for high-scoring cases, where high score is indicative of suspected fraudulent activity on the payment card.
How XAI works in Reason Reporter
During model training, we utilize the reason reporter algorithm to “bin” (sort) each input variable and then compute moments of the historical scores for each bin. In production, when a transaction is scored, the values of the variables are used to assign it to its bin. The smallest deviations of the score from the moments in each bin provide estimates of the input variables responsible for driving the score. Reason codes are then associated to a variable or group of variables. The schematic below of Falcon Case Manager, based on synthetic data used for demo purposes, shows a list of reasons (shown in the yellow box) generated by Reason Reporter for the highlighted transaction.
A squeeze of LIME
As I wrote in my first XAI blog, perturbation-based approaches have recently gained attention due to a technique called Local Interpretable Model-agnostic Explanations (LIME). This technique involves manipulating data variables in infinitesimal ways to see which variables will change the score by the largest amount. LIME then fits a sparse model on the locally dispersed, noise-induced dataset. The indicator variables with the largest coefficients in the model are then reported as the drivers of the score.
Working with LIME can be a challenge in that it explains local sensitivity vs. historical data support for reasons. At FICO, as we always do, we compared the newer algorithm to our existing methodologies.
For example, we studied a sample of high scoring fraud cases, which we processed through both Falcon Reason Reporter and LIME. The Falcon neural network was used to generate the scores for the noise-induced perturbation data points required by the LIME methodology. The output of both algorithms was then qualitatively analyzed through expert evaluation. Among the high scoring cases studied, 11% had exact same top three reason codes generated by Falcon Reason Reporter, and 53% had at two out of three of the same reasons codes.
In cases where the two systems differed, the story was interesting. For example, in a high-scoring case where both the algorithms agreed only on one of the three reasons, Falcon Reason Reporter pointed out that the high amount activity and authorization velocity were suspicious. On the other hand, LIME thought that the primary driver of the score was the rate of ATM withdrawal in the last two days, contrary to any evidence of the same.
Similarly in another case, with just a few transactions in the history, LIME thought that the reason for the score was PIN decline rate, though there were no PIN declines. We saw recurrence of such behavior again and again. These cases indicate that LIME over-emphasizes the local feature sensitivity induced through its local noise perturbation technique. On the other hand, by focusing on the global historical score patterns and global support, Falcon Reason Reporter is robust to the local noise.
Averaging out noise
Thus, we conclude that the tried-and-true Reason Reporter’s behavior (looking at global score support in the historical data) is a superior method for averaging out noise caused in complex local variable phase spaces and gradients of the solution space. In contrast, the newer XAI technique (LIME) has a tendency to pick up the local noise, leading to contrived explanations made more complicated by increased non-linearity.
In sum, we conclude that Falcon Reason Reporter continues to provide robust and accurate explanations to set the investigation teams that utilize machine learning on the right track.
A final note
Most explanation systems including Reason Reporter and LIME provide an assessment of which model input features are driving the scores. We also acknowledge that ability to understand causality is the natural next step in the explanation systems. This requires the explanation system to understand and parse latent features that actually drive the score. As I wrote in my previous blog, my recent patent application work has explored an architecture called LENNS (Latent Explanations Neural Network Scoring) that exposes more of what’s driving the score. More on that later.
If the robustness of the tech FICO originated in 1992 has got you dreaming of the 90s, check this out. And follow me on Twitter @ScottZoldi.