Why DeepSeek "may" matter
DeepSeek is a Chinese startup that may have developed advanced AI models using novel techniques that differ from traditional methods like Chain of Thought (CoT) and self-reinforcement, achieving impressive results with fewer resources.
Here's an in-depth analysis of how DeepSeek works and why it stands apart:
1. Core Technology and Approach:
Mixture of Experts: DeepSeek utilizes a "mixture of experts" method, which involves spreading data analysis across several specialized AI models. This is a more efficient approach to analyzing data using chips. The method minimizes time lost when moving data around.
Efficient Data Analysis: Unlike traditional systems that process data sequentially, DeepSeek's approach distributes the data analysis across multiple models. This allows for more efficient use of computing power and a reduction in the number of chips required for training.
Reduced Computing Costs: DeepSeek's approach allows them to train systems with significantly less computing power. DeepSeek engineers reported using approximately 2,000 Nvidia chips and about $6 million in computing power, while other leading companies use 16,000 chips or more and spend about ten times more.
Reinforcement Learning (RL): DeepSeek uses reinforcement learning (RL) to improve the reasoning capabilities of its models.
No Supervised Fine-Tuning (SFT) as a Preliminary Step: One of the primary ways DeepSeek differs from other models is that it applies RL directly to the base model without using SFT as a preliminary step. This approach encourages the model to explore chain of thought processes for solving complex problems.
Self-Evolution: DeepSeek's models, particularly DeepSeek-R1-Zero, undergo a self-evolution process during RL training, where they autonomously improve their reasoning abilities. The models develop advanced problem-solving strategies without explicit programming.
Cold Start Data: DeepSeek-R1 incorporates a small amount of human-friendly, high-quality data as a "cold start" to accelerate performance and improve readability. It fine-tunes the model with long chain of thought examples.
2. How DeepSeek Differs from Chain-of-Thought (CoT):
CoT as a Result, Not a Starting Point: While DeepSeek models, especially DeepSeek-R1, use chain-of-thought reasoning, it's not the foundational approach. DeepSeek-R1-Zero develops CoT capabilities through reinforcement learning. DeepSeek doesn't rely on a large collection of supervised CoT data like some models do. It learns CoT naturally through its training process.
Focus on Self-Generated Reasoning: DeepSeek models develop reasoning processes autonomously through the RL process, generating their own chain-of-thought rather than relying on predefined examples.
Emphasis on Efficiency: The methods DeepSeek employs to use fewer chips also impacts how the model generates it's chain of thought, making the process more efficient and cost-effective.
Readability and Human Preferences: DeepSeek-R1 incorporates a small amount of data that is readable and human-friendly to improve the model’s readability and ensure that the model aligns with human preferences.
3. How DeepSeek Differs from Self-Reinforcement:
Pure Reinforcement Learning Approach: DeepSeek-R1-Zero uses a pure reinforcement learning approach, which means it does not rely on supervised data or self-play to enhance its reasoning capabilities.
Focus on Self-Evolution Through RL: Unlike models that might use self-play or other forms of self-generated data, DeepSeek models, especially DeepSeek-R1-Zero, develop through pure reinforcement learning, allowing them to self-evolve and discover reasoning patterns.
Emergent Behaviors: Through reinforcement learning, DeepSeek models naturally develop behaviors like self-verification, reflection, and long chain-of-thought generation.
领英推荐
Rejection Sampling and Supervised Fine Tuning: DeepSeek-R1 uses rejection sampling to curate supervised fine-tuning data from the RL checkpoint. This includes data from other domains to improve the model’s capabilities, not just reasoning tasks.
4. Key Innovations and Advantages:
Cost-Effectiveness: DeepSeek's ability to achieve state-of-the-art performance with significantly fewer resources challenges the prevailing notion that building leading AI systems requires massive investments in specialized chips and infrastructure.
Open-Source Approach: DeepSeek has open-sourced its AI system, which allows others to build and distribute their own products using its technologies. This approach fosters collaboration and speeds up the development of AI technology by allowing a wider group of researchers to build upon it.
Emergent Reasoning Abilities: DeepSeek models can discover and develop novel approaches to problem-solving through the reinforcement learning process.
Strong Performance: DeepSeek's models have demonstrated performance comparable to and sometimes exceeding those of leading AI systems from companies like OpenAI and Google on a range of reasoning and general knowledge tasks. For example, DeepSeek-R1 achieves a score of 79.8% on AIME 2024, slightly surpassing OpenAI-o1-1217 and a score of 97.3% on MATH-500, performing on par with OpenAI-o1-1217.
5. DeepSeek's Training Process:
DeepSeek-R1-Zero is trained through reinforcement learning on the base model without the use of SFT. DeepSeek-R1-Zero uses a rule-based reward system to evaluate the model’s responses based on accuracy and the use of specific formatting tags around the model’s reasoning process. The model is guided to first produce a reasoning process, and then provide a final answer.
DeepSeek-R1 uses a four-stage pipeline:
? ? 1. Cold Start: The model is fine-tuned with a small amount of high-quality, human-friendly data.
? ? 2. Reasoning-oriented RL: The model undergoes the same reinforcement learning process as DeepSeek-R1-Zero.
? ? 3. Rejection Sampling and SFT: The RL checkpoint is used to collect SFT data from different domains, including reasoning and non-reasoning tasks.
? ? 4. RL for all Scenarios: A second RL stage is used to improve the model’s helpfulness and harmlessness, while further refining reasoning skills.
Distillation: DeepSeek also employs distillation to transfer knowledge from larger models to smaller models. The smaller models are fine-tuned using reasoning data generated by DeepSeek-R1, demonstrating that reasoning patterns discovered by larger base models are crucial for improving reasoning capabilities. The distilled smaller models also achieve very good performance on benchmarks.
6. Challenges and Limitations
Language Mixing: DeepSeek models can sometimes mix languages in responses.
Prompt Sensitivity: DeepSeek models can be sensitive to prompts, with few-shot prompting sometimes degrading performance.
Software Engineering Tasks: DeepSeek models have not demonstrated a large improvement over other models on software engineering tasks.
DeepSeek represents a significant advancement in AI by developing models that are more efficient and cost-effective while still maintaining high performance. The company's use of a unique approach, mixture of experts and reinforcement learning directly applied to a base model, along with distillation techniques, distinguishes it from conventional methods that rely heavily on supervised learning. This also enables the model to achieve results similar to Chain-of-Thought and self-reinforcement approaches without explicitly relying on them. DeepSeek’s work has not only challenged the status quo in AI development, but has also raised questions about the geopolitical landscape of AI research.
freelancer
2 周legalpdf.io AI fixes this Emergent AI models evolve autonomously.
Associate General Counsel at DocuSign - Product and Strategic Partnerships
1 个月Interesting followup up - https://www.wired.com/story/deepseek-ai-china-privacy-data/
Lead Counsel for LATAM @ Docusign | Tech Lawyer | Cloud Computing, Data Privacy & InfoSec Expert
1 个月Great article, Ken!
Questions 1 and 2 are fundamental. Now on to the answers!
Dual Qualified (California and Singapore) | Head of Commercial Legal (APAC) | ACC Global Top 10 30-Somethings Award Winner | CIPP/E and CIPP/A
1 个月Great post on DeepSeek and only one I’ve read which doesn’t just focus on the financials involved! My sentiments exactly re the trade-offs in cost. While cheaper models may be more appealing, there may be some hidden issues we haven’t considered yet. I mean, for starters, how can we ensure that these models align with ethical standards and guidelines? There are, of course, issues around accountability too. Also, there are questions around data and privacy management. Their privacy policy, as it stands now, is rather broad. After all, DeepSeek can use the data collected to improve its “safety, security and stability", and such data can be shared with a myriad of third party players. So one should probably not use it for any confidential or sensitive purposes (or exercise great caution and not be too trigger happy). That said, I’m curious to know how this space will develop and it is very exciting!