Data Science, Machine Learning, AI - Challenges & Solutions
Jitin K. Singhal, CFA
Finance & Technology Leader | 5x Startup Founder | Driving Sustainable Revenue & EBITDA Growth
Data Science (“DS”), Machine Learning (“ML”), and Artificial Intelligence (“AI”) applications are increasingly being used to analyze and develop deeper insights into everyday business activities. Having a functioning implementation of ML or AI model is a huge competitive edge in today's highly connected world. For some, it is the key to survival itself! With the scaling power of the cloud, the availability of free libraries to rapidly develop and deploy models, and the stability of underlying mathematical algorithms for the last 50+ years, businesses can't afford to get it wrong.
However, according to Gartner, "most data science and machine learning projects routinely fail to realize expected business value due to mistakes in project execution." As many as 80% of all projects (estimates vary from 60% to 90% depending on which survey you choose) are never deployed into production. The projects that do make it, rarely deliver the promised business value, are expensive to maintain and require highly specialized resources, both human and technological. In short, the magic of ML or AI being a cost-effective competitive advantage remains elusive.
This article offers some practical solutions to increase the success of ML or AI projects. For folks interested in technical details, a case study analyzing one of the most common use cases in ML implementations – credit card fraud detection – is presented at the end.
Practical Solutions to Increase ML & AI Project Success
Institutions can realize increased benefits of ML project by implementing one or all the solutions listed below:
- Recognize that Data Science is a function by itself. ML projects use a blend of functional, business, and technical roles and require sponsorship from the highest levels of the organization – the C-Suite. Just like the positions of Chief Technology Officer, or Chief Information Officer were added to the C-Suite in the 80s and 90s, success in using the power of ML & AI requires a Chief Data Scientist executive-level position. This executive can help organizations cut through the hubris or sales pitches and guide projects undertaken to completion.
- Consider the fundamentals of the problem at hand. Most realistic ML algorithms compress the feature space into a manageable set thereby introducing error into the model. Incorporating fundamental analysis at the feature space level and the unobserved-but-known-patterns (known unknowns) level can typically improve accuracy beyond a certain threshold. This concept is best highlighted by the blow-up of Long Term Capital Management hedge fund in 1998 that incorrectly applied the mathematics (Brownian motion) of physics based systems to financial derivatives, which obeyed no such laws. In other words, what’s in the error space of the model can cost dearly sometimes.
It is said, "history doesn't repeat itself, it only rhymes." Learning only from known observations limits the capabilities to handle information from a different pattern. All models suffer from the "Turkey problem"
- Assess the stability of the patterns being modeled at the onset. Model tuning is computationally expensive, even in the cloud. This crucial upfront analysis can help in designing the proper solution that can deal with shifting patterns. In our credit card fraud detection example (see below), while really good ML models can be developed via training them on historical patterns, fraudsters, being tech-savvy too, tend to deliberately shift fraud patterns quite often to beat ML models implemented by the FIs.
- Develop a multi-model approach, where multiple independent ML models work together to form a complex dynamic model and solve the problem at hand. This strategy can separate features into natural groups and reduce the complexity of the implementation as well as ongoing model maintenance (addressing the problem in point #3). For example, to better detect credit card fraud, the FIs can create several mini ML models, each with the capability to flag a transaction based on its own feature set. Say a customer pattern model (customer identity, how they spend, typical amounts, frequency, etc.). A risk management model (is the customer new, is the transaction usual for the customer, size of the transaction, origin, etc.). A general fraud detection model (is this general transaction fraudulent). Then, an ensemble model can processes the results of each of the components to determine whether the transaction at hand is fraudulent or not.
- Use best-in-class infrastructure, preferably in the cloud. Models running on the research team's computers are useful in providing point-in-time insights but not for adding operational excellence. A company whose DS team is able to access and use data in the cloud, especially large data sets in the petabyte or higher range, are far more likely to be successful in deploying ML models into production than others.
Conclusion
ML and AI implementations hold a lot of promise and can be existential for some companies. However, like any other business function, to add value they require robust operations, technical ability, and proper sponsorship from the C-Suite.
Case Study – Measurement of Success of the Credit Card Fraud Detection ML Model
Let's review one of the most common use cases of machine learning ("ML") algorithms - credit card fraud detection and measure its success in reducing losses.
The problem is a natural fit for ML because of the following characteristics:
- A large number of transactions occur every second of the day
- A real or fraudulent transaction can occur in any part of the world for any purpose
- Each transaction is identified as real or fraud (the target) after it is completed
- Every fraudulent transaction has real costs and customer service headaches for card issuers, merchants, and customers alike
- Fraudulent transactions are a very small portion (less than 0.05%) of the total number of transactions. Correctly identifying them is akin to finding a needle in a haystack
In theory, it would be quite easy to deploy the best performing ML algorithm (as determined by whatever metric used) given the vast amount of labeled dataset available at every financial institution ("FI"). The FIs should see a meaningful reduction in fraud-related losses from the very next day!
Theoretical Model Performance
Here is the model performance on test data. This model was trained on about 280,000 transactions.
Yes, the confusion matrix is not the best metric for evaluating the performance of ML algorithms on datasets containing unbalanced samples, but this model was specifically configured to adjust for it. Here are the ROC curves. The model is good to go.
Assuming that the FI executed the project well and deployed this model into production, the model should produce meaningful savings right away. Over time, as the model is updated by training it on new data, fraud-related losses should drop to near zero.
Analysis of Success
Since credit card ML project success metrics are proprietary, we can calculate overall success by performing analysis on aggregate data. Typically, credit card delinquencies (“DQ”) turn into charge-offs (“CO”) within 90 days. Thus, a "simple" success measurement metric could be CCFraud = (Charge-Off - Delinquencies). Note: If the FIs didn't have meaningful losses due to fraud, then DQ >= CO by definition.
The chart below shows the quarterly DQ rate on credit card loans by the top 100 banks (red line) and the difference between DQ and CO (blue line=DQ-CO). The blue line is both negative and stable. It also never gets close to zero. This shows that portfolio CO is greater than DQ (probably due to fraudulent transactions and other items). Additionally, the stability of the blue line shows that the FIs have achieved the minimum possible fraud level (probably by applying really expensive technology) and models are unable to reduce it any further.
Let's look at another chart showing the quarterly DQ rate on credit card loans by banks other than the top 100 (regional and community banks). As shown below, there is certainly more volatility in DQ and CO, but also notice that the difference between DQ and CO (our estimate of fraud) has significantly more volatility when compared to the same ratio for the top 100 banks. One can guess (confidently) that the smaller banks are either not using ML technology or not updating their ML models in a timely fashion.
Conclusion
The ML application by the top 100 banks is certainly able to reduce credit card loan charge-offs, however, they are not able to eliminate them. Institutions in other than the top 100 banks can benefit from adding ML models into their workflow or updating any ML applications they already have. Both, types of institutions can reduce losses further by incorporating one or all the recommended solutions.
#machinelearning #datascience #artificialintelligence #predictive #model #creditcard #fraud #analytics #modelaccuracy #datascientist #data #cloud #python #R #delinquency #charge-off #loans #credit #risk