Magic and Data Science

Magic and Data Science

what have machine learning and magic got to do with fraud prevention?

Like every good relationship, it all starts by establishing good expectations.

Machine learning and advanced analytics for the financial industry is racking up many expectations in recent times. By adding the Magician (popularly known as data scientist) to the intelligence equation, one would think the Abracadabra spell would be as much as necessary.

While focusing on several Machine Learning implementations for the AML and Fraud departments the only distinct thing I can point out is that Magic has absolutely nothing to do with it.

So let’s start from the beginning, why do we need Machine Learning, and specifically discussing fraud prevention, Anti-money laundering, Financial Intelligence Units and capital markets.

By dealing with past events, with a distinct knowledge of the business and cases analysis that was performed, we can try to assume/predict such events in the future. By saying future – it differs between 1 millisecond from now up to months and years to come.

Before even starting off the project the first point to highlight is to know what the business intends to do with the projected event. Will it be a cease case or a further investigation scenario? Will it be a prediction that will result in the development of a new rule?

By comprehending the business questions and further action items that will be initiated by using the results of the Machine learning prediction, one could better design and outline the logical process that needs to be described.

The second most essential part of the process is building the data set. Understanding the aggregation level required for the prediction. Data cleansing, data exploration, getting appropriate sources to extract past event that might affect the result accuracy. Primarily when talking about supervised approach, the unit in control should concentrate more on labeling the past events (whether it was or wasn’t the case they are searching for). The Labeling is an important part of the machine learning algorithm. This is what the entire model would be built upon. The feature selection is a strong, manually, very business oriented cycle. This cycle should be called the biggest data challenge of the project.


The third part, once the data set is completed, one need to outline the appropriate algorithm/method to analyze and forecast the result by taking advantage of the training set. The most common one for a binary (1, 0) approach would be a logistic regression but there are many others that will address various business requirements. By training the models and coming out with a suitable one for the training set – the third part is ready to be inspected and fully executed.

Once we found out that the model operates efficiently and effectively on the testing set (many iterations till this phase is done) the fourth part begins – the implementation.


Here it’s essential to make sure that the tools that were used for the ML modeling are ones that can extract code (Java , SQL , R) which can be implemented back to the systems that are working on the operational data (mostly the transaction data). Thus using the ML as an integrated part of the operational solution. ML isn’t supposed to be used for the analytical layer alone if the end deliverable of the project won’t contain the usage of the PA back in the operational system (as should be specified in the 1st part anyways) – you might as well quit. Today’s world requires using ML within the Fraud prevention modeling for smarter probability and efficient machine compliance.

So what’s Magic got to do with it? Absolutely nothing. The machine learning projects and the use of a data scientist are the farthest definitions from Magic there is. Hard work, continuous accuracy cycles, data exploration, and cleansing and ad-hoc researchers – those are the vital terms to use.

About the writer:

Ido Biger is the Head of the BI & Big data practice of Matrix-IFS, a company widely recognized for its compliance solution implementations throughout the financial industry, worldwide. Ido Biger has more than 14 years’ experience providing end-to-end data warehousing and BI solutions for several industries such as finance, high-tech, healthcare, telecommunications, security, and retail. An acknowledged figure in Israel’s BI community, Biger is a regular speaker at conventions all over the world and has served as a BI instructor and lecturer at Israel’s top IT colleges, sharing his approach and deliverables regarding operational BI, advanced analytics, Big Data implementations and interesting OEM solutions. Prior to joining matrix-ifs, Biger was the Head of BI & Data (CDO) of a large telecommunication company in Israel.

Previous Posts:



From my experience, Machine Learning may be extremely useful in three areas: self-tuning risk management, exclusion detection, and behavior prediction (well, some people say the latter two are essentially the same; technically it is true, but logically these are different processes). When analyzing huge volumes of data, it is very important to achieve subsequently five goals for each entity type: 1. Detect exclusions (non-typical data) 2. Recognize segments (keys and ranges) 3. Identify global and particular trends (directions and measures of changes) 4. Predict "normal behavior" (the mainstream and its tenable, self-consistent borders) 5. Mark (and rank, if applicable) entities deserves human attention (either positive or negative). With appropriate use of ML in the aforesaid areas, these goals may be achieved with little or no effort. As the market changes, AML, Fraud Prevention or KYC platform adjusts itself, like gyroscope. It's a kind of magic, isn't it?

要查看或添加评论,请登录

Ido Biger的更多文章

社区洞察

其他会员也浏览了