LLMPC: Large Language Model Predictive Control
This article is a summary of a full paper available at https://arxiv.org/abs/2501.02486
The original research code and examples are available at https://github.com/gmaher/llmpc
Large Language Models (LLMs) seem to perform better when given structured prompts, particularly prompts that ask the LLM to reason and plan before acting are effective. However, fundamental questions remain: Why do these methods work? What are their limitations? How can we improve them further? This post examines LLM prompting through the lens of Model Predictive Control (MPC), a framework where controllers generate and execute action plans. We show that LLMs act as approximate cost function minimizers when planning, and that their performance can be enhanced by incorporating explicit planning objectives.
The MPC Framework
In MPC, an agent navigates a state space by choosing actions that minimize an objective function over a planning horizon. The objective typically combines task-specific costs (like distance to a goal state) with regularization costs (like action complexity). The action plan is obtained by solving the objective function minimization problem. From the perspective of MPC we see that asking LLMs to generate a plan is thus analogous to using LLMs to approximately solve the MPC objective minimization problem.
LLMs as Planners
In the MPC viewpoint an LLM takes a prompt (encoding the current state) and outputs a sequence of tokens that map to actions. Different prompting methods (ReAct, Tree-of-Thoughts, etc.) thus mostly vary in how they structure this mapping. The key insight is regardless of prompting structure, all planning prompts are limited by the fact that LLMs can only approximately solve the planning optimization problem.
领英推荐
Improving Performance with LLMPC
Since LLMs are approximate optimizers, we can enhance their performance by making better use of explicit objective functions. Our LLMPC method: 1. Uses the LLM to sample multiple possible control sequences 2. Evaluates each sequence using actual cost and state update functions 3. Selects and executes the best-performing sequence 4. Replans after a few steps We demonstrated this approach on two problems: 1. Spring-Mass Control: LLMPC successfully controlled a spring-mass system to reach target states, though with higher objective values than exact MPC solutions (as expected for an approximate method). 2. Code Generation: We compared LLMPC against one-shot generation for creating a Flappy Bird game. LLMPC produced more complete code with additional features like sprites and game-over screens, showing how longer-horizon planning enables handling more complex tasks.
Key Takeaways
- LLM prompting methods can be understood through the MPC framework
- LLMs act as approximate optimizers of planning objectives
- Performance can be improved by incorporating explicit cost functions
- LLMPC provides a systematic way to enhance LLM planning abilities
This framework helps explain why techniques like Monte Carlo Tree Search with external evaluators improve LLM performance, and suggests further ways to enhance LLM-based planning systems.