EMIR Refit Pairing and matching : A machine learning approach.
Jitender Malik
SVP | Data engineering & Science(AI/ML, Gen AI, Computer Vision) | AI Engineering Lead at NatWest Group
The EMIR mandates EU counterparties to report their transactions to trade repositories. EMIR focuses on the double-sided reporting, which means that details of a trade between two EU entities will be reported separately by each of the counterparties. According to the regulation, the two counterparties must agree on a unique trade identifier and on the characteristics of the trade itself (so-called common data) before submitting the report.
While both the counterparty reports their legs(trade events) to the TR's they need to ensure that both the legs of the trade is Paired and matched(Completeness and accuracy). The challenge at the financial institution end is that how the institution knows whether they or at fault or the counterparty in the trade to take any action for correction.
To solve above problem this paper proposes the use of One-class SVM to identify the party at fault between two sides of the same trade reported to regulator in case of a mismatch happens.
As a part of solution we will first understand how OCSVM works and then further w create and train a model on trade events which are paired and matched successfully and based on these matched events the model will create an optimal hyperplane of true states of paired and matched.
The hyperplane which the model built will check for the new trade events which are unmatched and will check the distance from the hyperplane. If the point is inside the hyperplane this can be marked as fault at counterparty as hyperplane which the model has built states such trade always match and the issue lies at the counterparty, in cases where the points lie outside of the hyperplane we can identify them as issue at the bank itself as model doesn't recognise these events have ever been matched successfully historically.
Method?: As stated the paper uses once class SVM(OCSVM) which is an extension of Support vector machine(SVM) a classification algorithm in machine learning.
Introduction to SVM
SVMs are used for binary or multi-class classification by finding the optimal hyperplane (maximized marginal distance) which separates different classes. Depending on the dataset if we can linearly separate it this is termed as hard margin however if the boundary is not separable, we opt for soft margins.?Hyperplane is created in between datasets to indicate which class it belongs to. SVM can be used for Classification, Regression problems.??
Additionally, SVM can efficiently perform a non-linear classification using the?kernel trick, which represent the data only through a set of pairwise similarity comparisons between the original data observations and representing the data by these transformed coordinates in the higher dimensional feature space. ?? ?
One-class SVM?(OCSVM)
One Class Support Vector Machines (OCSVM) is one type of outlier, anomaly, or novelty detection algorithm. Generic SVMs do the same by separating data into several classes creating a hyperplane. The hyperplane then decides which class any subsequent data belongs to. The key working principles of one-class SVM includes Outlier Boundary which operates by defining a boundary around the majority class (normal instances) in the feature space. This boundary is constructed to encapsulate the normal data points, creating a region of normalcy. Secondly, the algorithm strives to maximise the margin around the normal instances, allowing for a more robust separation between normal and anomalous data points. This margin is crucial for accurately identifying outliers during testing. Lastly, OCSVM has an in-build hyperparameter called “nu,” which represents an upper bound on the fraction of margin errors and support vectors. Fine-tuning this parameter influences the model’s sensitivity to outliers.??
?
In the above image we can see the trade events inside the blue region (hyperplane) are the matched trades, while the ones outside of this region can be termed as an unmatched event when analysed for party at fault. The hyperplane separates the anomalies here in our case we treat mismatches or unmatched event.?
?
领英推荐
Implementation
Before we start training the One-Class SVM model, it’s essential to normalise the trade data to ensure that features are on similar scales. This helps prevent certain features from dominating the model’s learning process due to their large values. The data is then transformed using a kernel function to map it into a higher-dimensional space. This transformation allows the algorithm to find a hyperplane that separates normal data points from the origin, effectively capturing the distribution of normal data. OCSVM aims to find the optimal hyperplane that maximises the margin between normal data points just like SVM does and the origin in the transformed space. The hyperplane formed is learned during the model training process by adjusting the model parameters, including the kernel parameters and regularisation parameters.?
After the training process ends, OCSVM generates a decision function that assigns anomaly scores to new trade points. The anomaly score represents the distance of a data point from the learned hyperplane. Data points with higher anomaly scores are considered more likely to be anomalies in trade reporting which can help counterparties to detect the mismatches at an early stage (before they report to authorities and can correct it).?
?OCSVM sets a threshold on the anomaly scores. Trade data points with scores above the threshold are classified as anomalies, while those below the threshold are considered normal. Threshold can be adjusted based on the desired trade-off between false positives (normal points classified as anomalies) and false negatives (anomalies classified as normal).?
Classification of mismatched trade events using OCSVM?
First, we collect the matched trade events which includes TradeID, TradeTime, Buyer/Seller, price etc. After data is collected , we need to perform some pre-processing before we start training the model. This stage includes data cleaning, feature engineering, normalisation, scaling, etc. Moving forward can train the model with pre-processed data. Once we have the model trained, we can use this for detecting party at fault based on hyperplane as shown below.
?
?
?
Based on the findings we can identify which party is at fault in case of mismatch happens:?
Counterparty at fault?: The points which lie within the hyperplane?(All red)
Party at fault : The classification point which lies outside the hyperplane(All Blue)
Conclusions :?
The paper described the use of OCSVM to solve the problem of pairing and matching outlined by EMIR Refit. With the use of this algorithm the European financial institutions will be able to better understand the trade information and can correct them to know before hand rather than going into large operational process. ?
The follow-up work will include further refinement of this procedure and incorporating newer innovations to the steps. ?
Software Development Manager @ Clearwater Analytics | AWS Expertise | Computer Vision
8 个月Great going Jitender & team. It is an absolute delight to witness this team's journey - you folks have come a long way employing intelligent ML algos to T&TR domain.
Interesting. But I don't really understand why a problem that can be solved deterministically, benefits from a statistical solution. Maybe I'm missing something. But to me, the first step would be a bi-party handshake to agree an identifier, and then it should be simple to confirm both pairing and matching.
AI/ML Enthusiast | Python and C programmer
8 个月Very excited to see the impacts such solutions will have in banking sector
Associate Vice President at NatWest Markets(RBS)
8 个月Working on this solution has been an enriching experience. Leveraging One-Class SVM (OCSVM) has enabled us to effectively identify mismatches in double-sided reporting for EMIR compliance, significantly reducing errors and improving efficiency. The precision of OCSVM in detecting anomalies before they escalate has been a game-changer. Excited to see the positive impact this will have on the industry! #OCSVM #MachineLearning #EMIRCompliance #Innovation #FinTech
Senior Software Engineer
8 个月Thank you for including me. I'm excited to work with you more on these innovative projects!