DeepSeek-Prover-V1.5 represents a significant leap forward from its predecessor, DeepSeek-Prover-V1. This new iteration is designed for theorem proving within Lean 4, a formal proof verification system. By optimising training and inference processes, DeepSeek-Prover-V1.5 achieves remarkable improvements in solving complex mathematical problems.
The Approach: A Comprehensive Pipeline for Formal Theorem Proving
DeepSeek-Prover-V1.5 employs a multi-faceted approach to tackle the complex challenge of formal theorem proving. The model integrates cutting-edge techniques across several stages of development, from pre-training on specialized datasets to reinforcement learning and advanced search algorithms. Here’s a breakdown of how each component works and contributes to the overall success of the model.
1. Pre-Training on Mathematical and Formal Languages
The journey begins with extensive pre-training. DeepSeek-Prover-V1.5 is initialized with a base model already pre-trained on the DeepSeekMath-Base, a dataset specifically curated for mathematical reasoning and formal languages such as Lean, Isabelle, and Metamath.
- Mathematical Focus: The pre-training involves learning from various mathematical texts, equations, and code snippets, enabling the model to grasp the syntax and semantics crucial for formal theorem proving.
- Formal Languages: Since theorem proving in environments like Lean requires precise adherence to formal systems, the pre-training phase ensures that the model understands and generates code that aligns with these rigorous specifications.
2. Supervised Fine-Tuning with Enhanced Data
Once pre-training is complete, DeepSeek-Prover-V1.5 undergoes supervised fine-tuning on an enriched dataset. This dataset is not just about proving theorems but is meticulously designed to bridge the gap between natural language reasoning and formal mathematical logic.
- Data Augmentation: The fine-tuning dataset includes formal theorem proofs and augmented data that introduces natural language explanations alongside the Lean 4 code. This approach helps the model to better align its formal reasoning with human-like, step-by-step logical deductions.
- Thought-Augmented Proof Generation: To enhance the model’s ability to reason through complex proofs, the fine-tuning process incorporates “thought-augmented” data. Here, the model learns to generate natural language explanations before generating the corresponding formal proof code. This step ensures that structured and logical thought processes guide the model’s proof generation.
3. Reinforcement Learning from Proof Assistant Feedback (RLPAF)
A critical innovation in DeepSeek-Prover-V1.5 is reinforcement learning, specifically leveraging feedback from proof assistants like Lean.
- Feedback Loop: During this phase, the model generates proof candidates for a given theorem and then submits these proofs to the Lean 4 prover for verification. Based on whether the proof is correct, the model receives a binary reward (1 for correct, 0 for incorrect), which it uses to refine future proof attempts.
- GRPO Algorithm: The Group Relative Policy Optimization (GRPO) algorithm drives the reinforcement learning process. Unlike traditional methods that require a separate critic model, GRPO directly optimizes the model by comparing the relative success of different proof candidates. This group-based approach ensures that the model continually improves by learning from successful and unsuccessful attempts.
4. Hybrid Proof Generation: Truncate-and-Resume Mechanism
One of the standout features of DeepSeek-Prover-V1.5 is its hybrid approach to proof generation, combining the strengths of both step-by-step and whole-proof generation methods.
- Whole-Proof Generation: Initially, the model attempts to generate an entire proof in one go. This method is computationally efficient but risks compounding errors, as the model may receive delayed feedback on intermediate steps.
- Truncate-and-Resume: To mitigate the risk of errors, DeepSeek-Prover-V1.5 employs a truncate-and-resume mechanism. If the Lean prover detects an error in the generated proof, the proof is truncated at the point of failure. The model then resumes proof generation from this truncated point, using the valid part of the proof as a prompt.
- Monte-Carlo Tree Search (MCTS): This mechanism is integrated into a broader Monte-Carlo Tree Search framework, where the search process is guided by intrinsic rewards from exploration and extrinsic feedback from the Lean prover. This integration allows the model to explore multiple potential proof paths efficiently, increasing the likelihood of finding valid proof.
5. RMaxTS: Intrinsic Rewards for Enhanced Exploration
DeepSeek-Prover-V1.5 introduces RMaxTS, an enhanced Monte-Carlo Tree Search algorithm that uses intrinsic rewards to drive exploration to address the challenge of sparse rewards in theorem proving.
- Exploration vs. Exploitation: In traditional MCTS, the balance between exploring new paths and exploiting known successful strategies is crucial. RMaxTS tilts this balance towards exploration by assigning intrinsic rewards when the model discovers new and diverse proof paths. This approach encourages the model to avoid refining known strategies and instead pushes it to explore uncharted areas of the proof space.
- State Abstraction: During the tree search, the model abstracts the proof process into a sequence of state transitions, each representing a step in the proof. By focusing on states that have yet to be fully explored, RMaxTS maximizes the diversity of the search process, which is key to solving more complex theorems.
Performance That Sets New Benchmarks
DeepSeek-Prover-V1.5’s performance has been rigorously tested against established benchmarks, such as the miniF2F and ProofNet datasets, designed to evaluate high school and undergraduate-level mathematical reasoning. The results are impressive:
- miniF2F: DeepSeek-Prover-V1.5 achieved a pass rate of 63.5%, setting a new state-of-the-art in the field. This significantly improved the previous best results, highlighting the model’s ability to handle complex proof generation tasks.
- ProofNet: The model also excelled on the ProofNet benchmark, with a pass rate of 25.3%. This performance further demonstrates DeepSeek-Prover-V1.5’s capability to tackle formal theorem proving at an advanced level.
Now, that is a waw paper! Super impressive! Read it here: https://arxiv.org/pdf/2408.08152