Intelligent Traffic Routing
As of December 2023, UPI payments have crossed 10 billion in India, showcasing growing acceptance for digital payments. For any organisation, solving at scale can be an extremely daunting task. With over 11 million concurrent users on the Dream11 app, ensuring seamless and efficient payment experiences for them comes with its own set of challenges.
In this blog, we'll take you on a journey through optimising payment success rates in routing transactions, the challenges we faced, the innovative solutions we developed, and the mathematical underpinnings behind our approach. Join us as we delve into the world of non-stationary bandits and how they've helped us maximize the success rate of payment routing.
Let’s understand more about payment routing.??
Our users make payments using various payment options across the country’s key payment service providers. In such a scenario, efficiently routing transactions is a complex challenge.
As shown in the figure above, these transactions are facilitated by multiple payment gateways with unique characteristics and in-depth transaction flow. These gateways may experience sudden downtime with limited visibility due to underlying issues. To ensure payment transaction success, making informed decisions about the ideal payment gateway for each transaction is crucial, considering many factors that impact overall gateway performance.
Challenges faced while routing payments
How did we solve it?
The saviour here is the routing service; it addresses the above challenges through a combination of reinforcement learning and meticulous engineering in designing a distributed real-time machine learning inference system.
Mathematical overview?
Now, let's get into the mathematical underpinnings of our algorithms. To address the first challenge of assigning the relevant payment gateway for each transaction, we've employed a range of algorithms, each with its unique characteristics and advantages.
Epsilon greedy strategy: At every trial, it randomly chooses an action with probability ε and greedily chooses the highest value action with probability 1 - ε. By default, ε-greedy is unguided and chooses actions uniformly at random.
Upper-Confidence-Bound: UCB considers the uncertainty of an arm and selects arms that have the highest potential. Uncertainty is modelled via confidence bounds, while the potential is represented by the upper confidence bound. Since bounds guide sampling, it has lower regret.
Thompsom sampling: Thompson sampling models uncertainty by building a probability distribution (beta) from historical rewards and then samples from the distribution when choosing actions. If the rewards are delayed, UCB selects arms deterministically. It chooses the same action until feedback is incorporated. In contrast, because TS chooses actions stochastically by sampling from the posterior distribution, It has lower regret.
Boltzmann Gumbel: Boltzmann Gumbel assigns scores to each action, which are transformed into probabilities using a temperature parameter and the softmax function to balance exploration and exploitation.
Given the non-stationarity of the success rate time series, we also tried out discounting and windowing alternatives for these algorithms. The intuition behind it was to give more weight to recent rewards.?
Experiments and Results
Dataset The dataset comprises the following attributes for each transaction
The routing dataset used in this analysis comes from over 100M transactions conducted on the Dream11 app for a week in December 2022. We used the first day’s transactions (tuning dataset) for model validation to decide the optimal parameter for each competing bandit algorithm. Later, we ran these algorithms with tuned parameters on a one-week transactions data (evaluation dataset).
领英推荐
Offline Simulator
To address the challenge of which algorithm to take online in production, we built a simulator for backtesting the learned policies. We assume the underlying payment gateways to follow a binomial distribution that models the number of successes in a fixed number of independent trials, each with the same probability of success. In the case of a continuous-time process, we can think of trials as the time intervals over which we observe the process and the success as the times at which the transactions are successful. Also, the probability of success in each time interval is assumed to be constant.
Hyperparameter Tuning:
With the simulator handy, the analysis of discounting and window-based strategies on top of the set of multi-arm bandit algorithms described earlier was a cakewalk. Below are the evaluation results on the tuning dataset. The charts below show how discounting and windowing old beliefs reduces overall regret.
While benchmarking the algorithms on the evaluation dataset, we observe that UCB (sliding window length of 200 transactions and with a discount factor of 0.6) has the lowest cumulative regret. The discounted Boltzmann-Gumbel algorithm achieves comparable performance to the discounted UCB algorithm. During the testing, three algorithms – UCB algorithm with w = 200, α = 0.6; Thompson sampling w = 200, α = 0.6 and ?? greedy algorithm with ?? = 0.2, w = 100 yielded the lowest regret.
System Architecture?
Bandit System Requirements: We design our production system to incur low technical debt using the decision service architecture from the research paper by Microsoft.
System Architecture Innovation at Dream11
With the challenges described earlier in the blog, the proposed system is expected to deliver a highly scalable, reliable, secure, maintainable, and cost-effective payment processing solution by adhering to these critical design requirements.
We evaluated three approaches described below: A classic HTTP service, a Clipper-based approach and an Agent-based approach.?
Tradeoffs in system design?
Now, let's benchmark the approaches across the system design requirements derived from the challenges we face while designing such payment routing systems.
We deployed the agent-based system, eliminating the need for a separate database, saving costs significantly. It further reduced the system’s latencies by avoiding the overhead linked to the external database connections. Using an asynchronous agent within this system has allowed us to forgo a job scheduler, as this agent can continuously update the rewards asynchronously within the same service. The most compelling advantage offered by this method is its ability to scale to a remarkably high level of concurrent requests, a crucial requirement for our system. Hence, despite the inherent tradeoffs, the advantages of the third method have far outweighed its limitations, leading us to choose it for our current system implementation.?
Conclusion: Online Experiments
The study aims to assess the performance of optimal algorithms by conducting online experiments involving around 240 million transactions over 60 days. These transactions were randomly assigned to one of four routing algorithms: the existing rule-based algorithm and three bandit-based algorithms chosen based on a specified method. The standard algorithm handled about 10% of the transactions, while each bandit algorithm processed roughly 30% daily.
As shown below, the results highlight the differences in daily success rates between the baseline and three bandit-based algorithms from May 1, 2023, to May 25, 2023. Notably, the sliding-window UCB bandit algorithm outperformed the others, achieving a statistically significant cumulative uplift of 0.92% over one month.
What's Next?
We proposed a Routing Service architecture using a novel Ray-based implementation for optimally scaling bandit-based payment routing to over 10,000 transactions per second. Read through the research paper published in the AI-MLSYSTEMS 2023 conference. This system has far-reaching applications, especially in the fast-paced world of fintech payments, With the massive scale of National Payments Corporation Of India (NPCI) UPI (Unified Payments Interface) transactions in India, our system can revolutionize payment gateways by optimizing success rates in real-time (10K TPS). Imagine the impact on the entire payments ecosystem, from enhancing user experiences to boosting financial efficiency.
Our future endeavours will focus on predicting time series change points, which promises to minimize regret, elevate our system's performance and enhance overall success rates in the face of fluctuations.
While this experiment has been a great success, it's worth noting that we've left some stones unturned. The cost of transaction fees and the nuanced context of user transactions are territories yet to be explored. We're excited to embrace this challenge in our ongoing pursuit of delivering world-class user experiences to our 200 million+ users.
Learn more - tech.dream11.com/itr
Rockstars who made this possible
Associate Professor | Co-Founder of Rewardwise.co | Solving challenging AI problems as a consultant
10 个月Great blog ?????? Aayush Chaudhary ? .
Senior Data Analyst at Inspire Institute of Sport | [Machine Learning, Python, SQL]
10 个月Thanks for the detailed article ?????? Aayush Chaudhary ?. Do keep posting. Cheers.