Integrating Machine Learning Models within Matured Business Process
Small Business Credit Risk Scoring Architecture

Integrating Machine Learning Models within Matured Business Process

Machine Learning today is reaching every business process of enterprise, helping to create value, enhance customer experience or to bring in operational efficiency. Business today have necessary infrastructure, right tooling and data to generate insights faster than before.

While Machine Leaning models can have a significant and positive impact on how business process are run, it can also turn out to be risky, if put in live production without monitoring these models for reasonable amount of time. Major hurdle one hits is when organizations have some form of business rules embedded into their critical business process. These rules might have evolved over time taking real world domain knowledge into play and also might be performing exceptionally well. In these scenarios stakeholders typically might push-back on completely doing away with existing rules ecosystem.

Challenge with rule based system is data and business scenario change faster today to a point where either rules are unable to catch up with real world scenario or it is very time consuming to create and maintain additional rules

With the set background, this article is about how we can make use of best of both worlds (Rules + Machine Learning) and also over time measure performance of machine learning models with real world data to see if they can exist by themselves.

I am going to take a banking use case, "Small Business Risk Scoring" as the business process to elaborate on the model deployment options. Said that, deployment options below can be applied to any business process.

Typical risk scoring architecture looks like one below today

No alt text provided for this image

Risk scoring is performed based on customer behavior/history and data from credit bureau. These scores along with underlying transaction data is stored in a centralized feature store which again is distributed to bank downstream systems like Collections, Regulatory, Risk etc. High risk customer who are 30+ days past due are prioritized to service agent in real time.

The brain of score calculation today is a rule based scoring engine built over last decade applying domain context as well as risk pattern seen over time. Business today wants to upgrade the scoring engine to take into account external data sources like micro and macro economic factors and also industry data that small business operates on to identify external risk factor that may probably turn good standing account to delinquent.

With this said introducing new data sources might introduce additional complexity into enhancing and maintaining rule based system. This is a place exactly where machine learning can learn from underlying data without being explicitly programmed.

Let us get into few deployment approach assuming we already have some machine learning model built with customer behavior/transaction history, bureau data, micro and macro-economic data and Industry segmentation data.

Approach 1 - Stacked scoring model using meta-learner

No alt text provided for this image

In this case we use output of both rule based and ML model to predict customer risk. The outputs are stacked together either as rules or preferably as simple logistics regression model (meta-learner)

Advantages

  • Increased model accuracy as stacked meta-learner learns from both hand coded rules as well as machine learning models
  • Easy to monitor individual model performance and understand strength and weakness of each system

Disadvantages

  • Increased execution time and slightly higher deployment complexity
  • Need to train and monitor multiple ML models (Risk Scoring ML model and Meta-learner)
  • Rules based engine always in the loop (Not bad though). Over time if comfortable with ML model, rules engine and meta-learner can be removed

Approach 2 - Champion/Challenger model deployment

No alt text provided for this image

In Champion/Challenger mode, Machine learning model is deployed in parallel pipeline to rules based engine. Both pipeline scores incoming transactions in parallel. To start with risk scoring decision is done using rule based engine while ML model operates in dark mode where scores are used for offline analysis. Over time once we have sufficient positive metrics on ML model performance one can toggle ML model to Champion and rules to challenger.

One another option is to see individual performance by segment and balance load to both scoring component based on segment level performance. Say rules engine works better for high credit score customers and ML model for low, we can put a simple rule to divert incoming transactions to respective scoring component by credit score levels

Advantages

  • Simple pipeline
  • Easy to toggle load based on performance or balance load between both

Disadvantages

  • Complexity in case need to balance incoming transaction intelligently between both champion and challenger model
  • Possibly less accurate than meta learner method

Approach 3 - Serial pipeline deployment

No alt text provided for this image

In serial deployment the ML model is trained slightly different from other 2 deployment options. In this case machine learning model is trained with output of rules engine as a separate feature along with features from other data sources. This is approach to go when you are looking to compliment and augment current rules based system with ML model for better decision boundary.

Advantages

  • Simpler pipeline compared to stacked output

Disadvantages

  • Rules Engine always in loop

Each of the approach stated above has its purpose and depends on business process flow, criticality of business process, business/domain context coded in current rules ecosystem and finally decision accuracy of the current process.

With my experience in Banking and Finance, have seen Approach 1 more favorable in Anti-Money Laundering and underwriting projects. Approach 2, more on Marketing and Fraud Analytics side and Approach 3 in places where black box or vendor products like Actimize, Kofax etc exists today and new machine learning model is trained with the score coming in from these vendor products along with other new features.

Time to select one that suits better for your business process and also one that will align to all stakeholders expectation

Happy machine learning deployment!

Vikram Murthy

co-founder at AmyGB.ai

4 年

Thanks for setting this cat amongst the pigeon's ..very sane look at deployment ! I just wonder though, how one could use this for stuff like text and vision which is what we do ..our rule engines are actually 10 human beings with 0 documents between them ??..we typically rely on human QC for the first month or so and then automate categories that match human accuracy ..so I guess your approach 2 is probably the only close one..the rest won't work in areas where there's no codifiable rules ( sure v could do a deep dive with all stakeholders and put in about 50-100 rules, but that's just not scalable )..if u have read this far, I appreciate your patience !

回复
Dr. Monika Singla

Ph.D. | IIT Delhi | Data Science | Big Data | Data Analytics

4 年

A v interesting read sir!!

回复
Avinash Kumar

Senior Business Analyst @ Agility | Business Analysis | Product Management | Project Management | PMP | PSPO | CPRE-AP | Agility Leadership Program

5 年

Amazing article! Nicely articulated ??

Bharathi S.

Digital Transformation

5 年

Easy read and informative...in stacked model and how the results are inferred?

Nice article . Approach 2 is much better as it will not impact existing setup and will give an opportunity to build a good scalable stack and validate the model .

要查看或添加评论,请登录

Srivatsan Srinivasan的更多文章

社区洞察

其他会员也浏览了