Augmenting Mathematical Optimization with Reinforcement Learning
Tamal Chowdhury, Ph.D.
CTO | Computer Scientist & Mathematician | Artificial Intelligence R&D ? Computational Modeling ? Product Engineering
Mathematical optimization drives complex decision-making across a diverse range of problems – e.g., energy management, inventory planning, network design, pricing & revenue management, production planning & scheduling, supplier selection, and transportation planning. This field has significantly evolved over decades, from its formative years around World War II to the modern age. While conventional optimization approaches have been very successful in solving real-world challenges, we are increasingly witnessing problem statements that they are unable to address efficiently. This necessitates the application of newer techniques and technologies to the world of optimization.
The idea of using machine learning to augment the capabilities and scope of mathematical optimization is not new. However, recent advancements in this field, especially in Deep Learning, have made its adoption much more viable today than ever before. In particular, Reinforcement Learning offers a great option, especially for combinatorial optimization problems. Researchers are working on building neural-based solvers that model the process of finding the optimal solution as a learning task. The objective is to train better heuristics using a large number of problem instances from a certain distribution, and subsequently leverage this learning to address new problem instances for the same or similar distributions.
The Limitations of Conventional Optimization Methods
In most real-world applications, three approaches are generally used for solving optimization problems:
Modern optimization problems are becoming increasingly complex, and these conventional approaches sometimes prove to be inefficient. For instance, exact methods have limitations when it comes to dealing with large instances; approximation algorithms are not easy to construct, and may suffer weak empirical performances; and heuristics do not provide theoretical guarantees, and often necessitate significant customization based on each specific problem.
In general, conventional methods struggle to address the needs of large-scale problems, especially those involving high-dimensional solution spaces. Despite key innovations (e.g., decomposition or partitioning methods), the issue persists. Moreover, real-world problems often have multiple local minima (or maxima), and it is not uncommon to see conventional methods stuck in local optima instead of reaching the global optimum. Finally, real-world problems today often involve dynamic environments, highly uncertain problem spaces, or non-linear constraints – modeling these characteristics using traditional methods is difficult.
Applying Reinforcement Learning to Mathematical Optimization: Why?
While machine learning offers a great option for addressing some of the limitations of conventional optimization approaches, Reinforcement Learning has immense potential for complex problems, especially in combinatorial optimization. Here are three major reasons:
Research has shown that reinforcement learning may be a good fit for certain types of optimization problems, such as:
Applying Reinforcement Learning to Mathematical Optimization: How?
The optimization problem needs to be modeled as a sequential decision-making process, generally as a Markov Decision Process (MDP).
Let’s take the example of the Traveling Salesman Problem (TSP), a special case of the Quadratic Assignment Problem (QAP), and an NP-hard problem. This problem can be formulated in the following manner:
领英推荐
Broadly speaking, the agent’s goal is to find a policy function that maps states into actions. The environment is defined by a particular instance of the problem (e.g., the Max-Cut problem), while the states are encoded with a neural network model. The agent then uses reinforcement learning algorithms to make decisions that move the environment to the next state (e.g., removing a vertex from a solution set). See the diagram below.
Prominent Techniques
Known Limitations & the Direction of Research
Reinforcement learning methods also have limitations, which impact their ability to solve all types of optimization problems. For instance, Policy Gradient methods can be sample-inefficient in high-dimensional problem spaces?and unstable during training if the reward signals are sparse or noisy. Similarly, Q-learning methods are sometimes inefficient for large state-action spaces, and MCTS is often computationally expensive, thus reducing its applicability to large-scale combinatorial problems.
Scalability to very large problem instances is perhaps the most important issue today. While the techniques mentioned earlier can handle most medium-to-large problems, addressing anything larger in scale remains a fundamental challenge. Generalization (i.e., the ability to transfer learned policies from one problem instance to another) is another major challenge, thereby limiting practical applicability for certain use cases. Finally, balancing exploration and exploitation in certain cases may also be challenging.
Most of today’s research is focused on addressing the above gaps. A major area of research is Graph-based Reinforcement Learning for combinatorial optimization is another important area. For instance, here is a recent paper highlighting the potential of graph structure optimization (e.g., in molecular optimization) and graph process optimization (e.g., in network games.) Another important research area is Meta-Learning, which focuses on enabling reinforcement learning agents to generalize across a distribution of problem instances. This significantly reduces the need to retrain from scratch in order to solve new problem instances, thus improving overall generalization capabilities. A third area of research is Learning to Optimize, where reinforcement learning is leveraged to guide conventional optimization algorithms to perform better (e.g., guiding Branch-and-Bound decisions.)
Closing Comments
Applying reinforcement learning to mathematical programming creates a powerful paradigm for solving complex, large-scale optimization problems. This can help develop highly scalable, more robust, next-generation optimization solutions. While this hybridization is still in its early stages, both the pace and the direction of progress are encouraging. Most research in this area today has a significant ‘applied’ component – i.e., focused on solving real-world challenges, such as approximating complex functions, learning better heuristics, and improving real-time decision-making in dynamic environments. These efforts are likley to help solve problems considered intractable till now.
References:
PS: 10 – 15% of this paper was written with the help of generative AI.