登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

New Grounds in Theorem Proving with DeepSeek-Prover-V1.5

Haitham Bou-Ammar

Reinforcement Learning Team Leader & BO Tech Expert @ Huawei Research London - Advisor @ Sanome - Honorary Assistant Professor at UCL. Ex-@Princeton, Ex-@Upenn. All opinions are my own.

发布日期: 2024年8月18日

DeepSeek-Prover-V1.5 represents a significant leap forward from its predecessor, DeepSeek-Prover-V1. This new iteration is designed for theorem proving within Lean 4, a formal proof verification system. By optimising training and inference processes, DeepSeek-Prover-V1.5 achieves remarkable improvements in solving complex mathematical problems.

The Approach: A Comprehensive Pipeline for Formal Theorem Proving

DeepSeek-Prover-V1.5 employs a multi-faceted approach to tackle the complex challenge of formal theorem proving. The model integrates cutting-edge techniques across several stages of development, from pre-training on specialized datasets to reinforcement learning and advanced search algorithms. Here’s a breakdown of how each component works and contributes to the overall success of the model.

1. Pre-Training on Mathematical and Formal Languages

The journey begins with extensive pre-training. DeepSeek-Prover-V1.5 is initialized with a base model already pre-trained on the DeepSeekMath-Base, a dataset specifically curated for mathematical reasoning and formal languages such as Lean, Isabelle, and Metamath.

Mathematical Focus: The pre-training involves learning from various mathematical texts, equations, and code snippets, enabling the model to grasp the syntax and semantics crucial for formal theorem proving.
Formal Languages: Since theorem proving in environments like Lean requires precise adherence to formal systems, the pre-training phase ensures that the model understands and generates code that aligns with these rigorous specifications.

2. Supervised Fine-Tuning with Enhanced Data

Once pre-training is complete, DeepSeek-Prover-V1.5 undergoes supervised fine-tuning on an enriched dataset. This dataset is not just about proving theorems but is meticulously designed to bridge the gap between natural language reasoning and formal mathematical logic.

Data Augmentation: The fine-tuning dataset includes formal theorem proofs and augmented data that introduces natural language explanations alongside the Lean 4 code. This approach helps the model to better align its formal reasoning with human-like, step-by-step logical deductions.
Thought-Augmented Proof Generation: To enhance the model’s ability to reason through complex proofs, the fine-tuning process incorporates “thought-augmented” data. Here, the model learns to generate natural language explanations before generating the corresponding formal proof code. This step ensures that structured and logical thought processes guide the model’s proof generation.

3. Reinforcement Learning from Proof Assistant Feedback (RLPAF)

A critical innovation in DeepSeek-Prover-V1.5 is reinforcement learning, specifically leveraging feedback from proof assistants like Lean.

Feedback Loop: During this phase, the model generates proof candidates for a given theorem and then submits these proofs to the Lean 4 prover for verification. Based on whether the proof is correct, the model receives a binary reward (1 for correct, 0 for incorrect), which it uses to refine future proof attempts.
GRPO Algorithm: The Group Relative Policy Optimization (GRPO) algorithm drives the reinforcement learning process. Unlike traditional methods that require a separate critic model, GRPO directly optimizes the model by comparing the relative success of different proof candidates. This group-based approach ensures that the model continually improves by learning from successful and unsuccessful attempts.

4. Hybrid Proof Generation: Truncate-and-Resume Mechanism

One of the standout features of DeepSeek-Prover-V1.5 is its hybrid approach to proof generation, combining the strengths of both step-by-step and whole-proof generation methods.

Whole-Proof Generation: Initially, the model attempts to generate an entire proof in one go. This method is computationally efficient but risks compounding errors, as the model may receive delayed feedback on intermediate steps.
Truncate-and-Resume: To mitigate the risk of errors, DeepSeek-Prover-V1.5 employs a truncate-and-resume mechanism. If the Lean prover detects an error in the generated proof, the proof is truncated at the point of failure. The model then resumes proof generation from this truncated point, using the valid part of the proof as a prompt.
Monte-Carlo Tree Search (MCTS): This mechanism is integrated into a broader Monte-Carlo Tree Search framework, where the search process is guided by intrinsic rewards from exploration and extrinsic feedback from the Lean prover. This integration allows the model to explore multiple potential proof paths efficiently, increasing the likelihood of finding valid proof.

5. RMaxTS: Intrinsic Rewards for Enhanced Exploration

DeepSeek-Prover-V1.5 introduces RMaxTS, an enhanced Monte-Carlo Tree Search algorithm that uses intrinsic rewards to drive exploration to address the challenge of sparse rewards in theorem proving.

Exploration vs. Exploitation: In traditional MCTS, the balance between exploring new paths and exploiting known successful strategies is crucial. RMaxTS tilts this balance towards exploration by assigning intrinsic rewards when the model discovers new and diverse proof paths. This approach encourages the model to avoid refining known strategies and instead pushes it to explore uncharted areas of the proof space.
State Abstraction: During the tree search, the model abstracts the proof process into a sequence of state transitions, each representing a step in the proof. By focusing on states that have yet to be fully explored, RMaxTS maximizes the diversity of the search process, which is key to solving more complex theorems.

Performance That Sets New Benchmarks

DeepSeek-Prover-V1.5’s performance has been rigorously tested against established benchmarks, such as the miniF2F and ProofNet datasets, designed to evaluate high school and undergraduate-level mathematical reasoning. The results are impressive:

miniF2F: DeepSeek-Prover-V1.5 achieved a pass rate of 63.5%, setting a new state-of-the-art in the field. This significantly improved the previous best results, highlighting the model’s ability to handle complex proof generation tasks.
ProofNet: The model also excelled on the ProofNet benchmark, with a pass rate of 25.3%. This performance further demonstrates DeepSeek-Prover-V1.5’s capability to tackle formal theorem proving at an advanced level.

Now, that is a waw paper! Super impressive! Read it here: https://arxiv.org/pdf/2408.08152

New Grounds in Theorem Proving with DeepSeek-Prover-V1.5

Haitham Bou-Ammar

Reinforcement Learning Team Leader & BO Tech Expert @ Huawei Research London - Advisor @ Sanome - Honorary Assistant Professor at UCL. Ex-@Princeton, Ex-@Upenn. All opinions are my own.

The Approach: A Comprehensive Pipeline for Formal Theorem Proving

1. Pre-Training on Mathematical and Formal Languages

2. Supervised Fine-Tuning with Enhanced Data

3. Reinforcement Learning from Proof Assistant Feedback (RLPAF)

4. Hybrid Proof Generation: Truncate-and-Resume Mechanism

5. RMaxTS: Intrinsic Rewards for Enhanced Exploration

Performance That Sets New Benchmarks

Less-Hyped AI

2,886 位关注者

更多精彩文章

社区洞察

The Approach: A Comprehensive Pipeline for Formal Theorem Proving

1. Pre-Training on Mathematical and Formal Languages

2. Supervised Fine-Tuning with Enhanced Data

3. Reinforcement Learning from Proof Assistant Feedback (RLPAF)

4. Hybrid Proof Generation: Truncate-and-Resume Mechanism

5. RMaxTS: Intrinsic Rewards for Enhanced Exploration

Performance That Sets New Benchmarks

Less-Hyped AI

2,886 位关注者

Short Circuit — Let AI Design Your Chips

2024年8月28日

Deriving DPO's Loss

2024年8月15日

Model Predictive Control from Mere Natural Language Commands

2024年8月14日

DATA INTERPRETER: AN LLM AGENT FOR DATA SCIENCE

2024年8月13日

A Leap Towards Human-Like AI: Recreating Human Memory in LLMs

2024年8月10日

Pluralistic Alignment of LLMs: Fix your Algorithm not just your data

2024年8月6日

Robotic Parkour with Deep RL

2023年7月8日

Empowering Efficient BO Transfer with Neural Acquisition Process (NAP)

2023年6月5日

Safe Reinforcement Learning - Part II

2022年11月20日

Safe Reinforcement Learning - Part I

2022年11月13日

社区洞察