登录查看更多内容

Integrating Machine Learning Models within Matured Business Process

Srivatsan Srinivasan

Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)

发布日期: 2019年3月21日

Machine Learning today is reaching every business process of enterprise, helping to create value, enhance customer experience or to bring in operational efficiency. Business today have necessary infrastructure, right tooling and data to generate insights faster than before.

While Machine Leaning models can have a significant and positive impact on how business process are run, it can also turn out to be risky, if put in live production without monitoring these models for reasonable amount of time. Major hurdle one hits is when organizations have some form of business rules embedded into their critical business process. These rules might have evolved over time taking real world domain knowledge into play and also might be performing exceptionally well. In these scenarios stakeholders typically might push-back on completely doing away with existing rules ecosystem.

Challenge with rule based system is data and business scenario change faster today to a point where either rules are unable to catch up with real world scenario or it is very time consuming to create and maintain additional rules

With the set background, this article is about how we can make use of best of both worlds (Rules + Machine Learning) and also over time measure performance of machine learning models with real world data to see if they can exist by themselves.

I am going to take a banking use case, "Small Business Risk Scoring" as the business process to elaborate on the model deployment options. Said that, deployment options below can be applied to any business process.

Typical risk scoring architecture looks like one below today

Risk scoring is performed based on customer behavior/history and data from credit bureau. These scores along with underlying transaction data is stored in a centralized feature store which again is distributed to bank downstream systems like Collections, Regulatory, Risk etc. High risk customer who are 30+ days past due are prioritized to service agent in real time.

The brain of score calculation today is a rule based scoring engine built over last decade applying domain context as well as risk pattern seen over time. Business today wants to upgrade the scoring engine to take into account external data sources like micro and macro economic factors and also industry data that small business operates on to identify external risk factor that may probably turn good standing account to delinquent.

With this said introducing new data sources might introduce additional complexity into enhancing and maintaining rule based system. This is a place exactly where machine learning can learn from underlying data without being explicitly programmed.

Let us get into few deployment approach assuming we already have some machine learning model built with customer behavior/transaction history, bureau data, micro and macro-economic data and Industry segmentation data.

Approach 1 - Stacked scoring model using meta-learner

In this case we use output of both rule based and ML model to predict customer risk. The outputs are stacked together either as rules or preferably as simple logistics regression model (meta-learner)

Advantages

Increased model accuracy as stacked meta-learner learns from both hand coded rules as well as machine learning models
Easy to monitor individual model performance and understand strength and weakness of each system

Disadvantages

Increased execution time and slightly higher deployment complexity
Need to train and monitor multiple ML models (Risk Scoring ML model and Meta-learner)
Rules based engine always in the loop (Not bad though). Over time if comfortable with ML model, rules engine and meta-learner can be removed

Approach 2 - Champion/Challenger model deployment

In Champion/Challenger mode, Machine learning model is deployed in parallel pipeline to rules based engine. Both pipeline scores incoming transactions in parallel. To start with risk scoring decision is done using rule based engine while ML model operates in dark mode where scores are used for offline analysis. Over time once we have sufficient positive metrics on ML model performance one can toggle ML model to Champion and rules to challenger.

One another option is to see individual performance by segment and balance load to both scoring component based on segment level performance. Say rules engine works better for high credit score customers and ML model for low, we can put a simple rule to divert incoming transactions to respective scoring component by credit score levels

Advantages

Simple pipeline
Easy to toggle load based on performance or balance load between both

Disadvantages

Complexity in case need to balance incoming transaction intelligently between both champion and challenger model
Possibly less accurate than meta learner method

Approach 3 - Serial pipeline deployment

In serial deployment the ML model is trained slightly different from other 2 deployment options. In this case machine learning model is trained with output of rules engine as a separate feature along with features from other data sources. This is approach to go when you are looking to compliment and augment current rules based system with ML model for better decision boundary.

Advantages

Simpler pipeline compared to stacked output

Disadvantages

Rules Engine always in loop

Each of the approach stated above has its purpose and depends on business process flow, criticality of business process, business/domain context coded in current rules ecosystem and finally decision accuracy of the current process.

With my experience in Banking and Finance, have seen Approach 1 more favorable in Anti-Money Laundering and underwriting projects. Approach 2, more on Marketing and Fraud Analytics side and Approach 3 in places where black box or vendor products like Actimize, Kofax etc exists today and new machine learning model is trained with the score coming in from these vendor products along with other new features.

Time to select one that suits better for your business process and also one that will align to all stakeholders expectation

Happy machine learning deployment!

Vikram Murthy

co-founder at AmyGB.ai

4 年

Thanks for setting this cat amongst the pigeon's ..very sane look at deployment ! I just wonder though, how one could use this for stuff like text and vision which is what we do ..our rule engines are actually 10 human beings with 0 documents between them ??..we typically rely on human QC for the first month or so and then automate categories that match human accuracy ..so I guess your approach 2 is probably the only close one..the rest won't work in areas where there's no codifiable rules ( sure v could do a deep dive with all stakeholders and put in about 50-100 rules, but that's just not scalable )..if u have read this far, I appreciate your patience !

Dr. Monika Singla

Ph.D. | IIT Delhi | Data Science | Big Data | Data Analytics

4 年

A v interesting read sir!!

Avinash Kumar

5 年

Amazing article! Nicely articulated ??

1 次回应

Bharathi S.

Digital Transformation

5 年

Easy read and informative...in stacked model and how the results are inferred?

1 次回应

Jatinder Singh

5 年

Nice article . Approach 2 is much better as it will not impact existing setup and will give an opportunity to build a good scalable stack and validate the model .

2 次回应

查看更多评论

要查看或添加评论，请登录

Srivatsan Srinivasan的更多文章

Journey into Data Science - Year of Learning Together

2020年9月8日

Journey into Data Science - Year of Learning Together

Can't believe it has been one year already. Time flies.

67 条评论
How to build a compelling data science portfolio?

2020年5月19日

How to build a compelling data science portfolio?

Data science portfolio is a unique way of showing your expertise to organization recruiting data science talent as well…

14 条评论
AIEngineering - Inside Story

2020年2月18日

AIEngineering - Inside Story

This is a quick post highlighting how content in my YouTube Channel (AIEngineering) is organized. I have been receiving…

3 条评论
Course Launch - Scaling and Accelerating Machine Learning Models

2020年2月4日

Course Launch - Scaling and Accelerating Machine Learning Models

Welcome to "Scaling and Accelerating Machine Learning Models" course. This is going to be complete hands on course…

17 条评论
Skill up on new age data technologies

2019年12月17日

Skill up on new age data technologies

As per LinkedIn emerging job 2020, Artificial Intelligence related skills is expected to grow rapidly across globe and…

15 条评论
Business and Data Understanding in Data Science Lifecycle

2019年11月18日

Business and Data Understanding in Data Science Lifecycle

"Give me data and I can do wonders"..

35 条评论
Data, Artificial Intelligence and Cloud Trends for 2020 and Beyond

2019年10月29日

Data, Artificial Intelligence and Cloud Trends for 2020 and Beyond

As we rapidly near 2020, let us see some of the key Data, Artificial Intelligence and Cloud trends to look forward Data…

7 条评论
Docker and Kubernetes for Data Science

2019年10月16日

Docker and Kubernetes for Data Science

Tensorflow, pytorch, pandas, numpy, protobuf, dask, sklearn, keras, xgboost, lightGBM, scipy and the list goes on…

13 条评论
A Day in the life of Data Analyst

2019年10月7日

A Day in the life of Data Analyst

Data Analyst are instrumental in turning business into data driven business. They abstract complexity from all…

9 条评论
How to stand out in Data Science Interview?

2019年10月1日

How to stand out in Data Science Interview?

You are technically ready to take on new role in data science, LinkedIn profile is up to date, GitHub portfolio…

11 条评论

See all articles

Integrating Machine Learning Models within Matured Business Process

Srivatsan Srinivasan

Chief Data Scientist | Gen AI | AI Advocate | YouTuber (bit.ly/AIEngineering)

Srivatsan Srinivasan的更多文章

社区洞察

其他会员也浏览了

Are You Data-Driven, Insight-Driven, or AI-Driven

How Banks Are Using AI for Document Intelligence

AI and the Finance Industry

BIG DATA AND ARTIFICIAL INTELLIGENCE-BASED STRATEGIC DECISION-MAKING PROCESS

Put AI for Decision-Making into Practice - Decision Intelligence

Beyond the Horizon: Pioneering AI/ML to Redefine Fintech, Finance, and Commerce

???? AI Use Cases in Finance and Banking

Analyst Insider Weekly

From Data to Growth: Digital Transformation in Today's Businesses in Thailand (Part 1: AXONS CPF IT Center, Kasikorn Labs, Krungthai Bank, and PTT GC)

Leveraging AI and Machine Learning for Data Management in BFSI: Opportunities and Challenges

Srivatsan Srinivasan的更多文章

Journey into Data Science - Year of Learning Together

How to build a compelling data science portfolio?

AIEngineering - Inside Story

Course Launch - Scaling and Accelerating Machine Learning Models

Skill up on new age data technologies

Business and Data Understanding in Data Science Lifecycle

Data, Artificial Intelligence and Cloud Trends for 2020 and Beyond

Docker and Kubernetes for Data Science

A Day in the life of Data Analyst

How to stand out in Data Science Interview?

社区洞察

其他会员也浏览了

Are You Data-Driven, Insight-Driven, or AI-Driven

How Banks Are Using AI for Document Intelligence

AI and the Finance Industry

BIG DATA AND ARTIFICIAL INTELLIGENCE-BASED STRATEGIC DECISION-MAKING PROCESS

Put AI for Decision-Making into Practice - Decision Intelligence

Beyond the Horizon: Pioneering AI/ML to Redefine Fintech, Finance, and Commerce

???? AI Use Cases in Finance and Banking

Analyst Insider Weekly

From Data to Growth: Digital Transformation in Today's Businesses in Thailand (Part 1: AXONS CPF IT Center, Kasikorn Labs, Krungthai Bank, and PTT GC)

Leveraging AI and Machine Learning for Data Management in BFSI: Opportunities and Challenges