登录查看更多内容

Why DeepSeek "may" matter

Ken Priore

Associate General Counsel at DocuSign - Product and Strategic Partnerships

发布日期: 2025年1月28日

DeepSeek is a Chinese startup that may have developed advanced AI models using novel techniques that differ from traditional methods like Chain of Thought (CoT) and self-reinforcement, achieving impressive results with fewer resources.

Here's an in-depth analysis of how DeepSeek works and why it stands apart:

1. Core Technology and Approach:

Mixture of Experts: DeepSeek utilizes a "mixture of experts" method, which involves spreading data analysis across several specialized AI models. This is a more efficient approach to analyzing data using chips. The method minimizes time lost when moving data around.

Efficient Data Analysis: Unlike traditional systems that process data sequentially, DeepSeek's approach distributes the data analysis across multiple models. This allows for more efficient use of computing power and a reduction in the number of chips required for training.

Reduced Computing Costs: DeepSeek's approach allows them to train systems with significantly less computing power. DeepSeek engineers reported using approximately 2,000 Nvidia chips and about $6 million in computing power, while other leading companies use 16,000 chips or more and spend about ten times more.

Reinforcement Learning (RL): DeepSeek uses reinforcement learning (RL) to improve the reasoning capabilities of its models.

No Supervised Fine-Tuning (SFT) as a Preliminary Step: One of the primary ways DeepSeek differs from other models is that it applies RL directly to the base model without using SFT as a preliminary step. This approach encourages the model to explore chain of thought processes for solving complex problems.

Self-Evolution: DeepSeek's models, particularly DeepSeek-R1-Zero, undergo a self-evolution process during RL training, where they autonomously improve their reasoning abilities. The models develop advanced problem-solving strategies without explicit programming.

Cold Start Data: DeepSeek-R1 incorporates a small amount of human-friendly, high-quality data as a "cold start" to accelerate performance and improve readability. It fine-tunes the model with long chain of thought examples.

2. How DeepSeek Differs from Chain-of-Thought (CoT):

CoT as a Result, Not a Starting Point: While DeepSeek models, especially DeepSeek-R1, use chain-of-thought reasoning, it's not the foundational approach. DeepSeek-R1-Zero develops CoT capabilities through reinforcement learning. DeepSeek doesn't rely on a large collection of supervised CoT data like some models do. It learns CoT naturally through its training process.

Focus on Self-Generated Reasoning: DeepSeek models develop reasoning processes autonomously through the RL process, generating their own chain-of-thought rather than relying on predefined examples.

Emphasis on Efficiency: The methods DeepSeek employs to use fewer chips also impacts how the model generates it's chain of thought, making the process more efficient and cost-effective.

Readability and Human Preferences: DeepSeek-R1 incorporates a small amount of data that is readable and human-friendly to improve the model’s readability and ensure that the model aligns with human preferences.

3. How DeepSeek Differs from Self-Reinforcement:

Pure Reinforcement Learning Approach: DeepSeek-R1-Zero uses a pure reinforcement learning approach, which means it does not rely on supervised data or self-play to enhance its reasoning capabilities.

Focus on Self-Evolution Through RL: Unlike models that might use self-play or other forms of self-generated data, DeepSeek models, especially DeepSeek-R1-Zero, develop through pure reinforcement learning, allowing them to self-evolve and discover reasoning patterns.

Emergent Behaviors: Through reinforcement learning, DeepSeek models naturally develop behaviors like self-verification, reflection, and long chain-of-thought generation.

领英推荐

How do I effectively weigh the risks and benefits of…

Machine Learning 2 年前

Understanding Machine Learning: Past, Present, and…

Executive Recruit 3 个月前

Nebullvm, an open-source library to accelerate AI…

Nebuly 2 年前

Rejection Sampling and Supervised Fine Tuning: DeepSeek-R1 uses rejection sampling to curate supervised fine-tuning data from the RL checkpoint. This includes data from other domains to improve the model’s capabilities, not just reasoning tasks.

4. Key Innovations and Advantages:

Cost-Effectiveness: DeepSeek's ability to achieve state-of-the-art performance with significantly fewer resources challenges the prevailing notion that building leading AI systems requires massive investments in specialized chips and infrastructure.

Open-Source Approach: DeepSeek has open-sourced its AI system, which allows others to build and distribute their own products using its technologies. This approach fosters collaboration and speeds up the development of AI technology by allowing a wider group of researchers to build upon it.

Emergent Reasoning Abilities: DeepSeek models can discover and develop novel approaches to problem-solving through the reinforcement learning process.

Strong Performance: DeepSeek's models have demonstrated performance comparable to and sometimes exceeding those of leading AI systems from companies like OpenAI and Google on a range of reasoning and general knowledge tasks. For example, DeepSeek-R1 achieves a score of 79.8% on AIME 2024, slightly surpassing OpenAI-o1-1217 and a score of 97.3% on MATH-500, performing on par with OpenAI-o1-1217.

5. DeepSeek's Training Process:

DeepSeek-R1-Zero is trained through reinforcement learning on the base model without the use of SFT. DeepSeek-R1-Zero uses a rule-based reward system to evaluate the model’s responses based on accuracy and the use of specific formatting tags around the model’s reasoning process. The model is guided to first produce a reasoning process, and then provide a final answer.

DeepSeek-R1 uses a four-stage pipeline:

? ? 1. Cold Start: The model is fine-tuned with a small amount of high-quality, human-friendly data.

? ? 2. Reasoning-oriented RL: The model undergoes the same reinforcement learning process as DeepSeek-R1-Zero.

? ? 3. Rejection Sampling and SFT: The RL checkpoint is used to collect SFT data from different domains, including reasoning and non-reasoning tasks.

? ? 4. RL for all Scenarios: A second RL stage is used to improve the model’s helpfulness and harmlessness, while further refining reasoning skills.

Distillation: DeepSeek also employs distillation to transfer knowledge from larger models to smaller models. The smaller models are fine-tuned using reasoning data generated by DeepSeek-R1, demonstrating that reasoning patterns discovered by larger base models are crucial for improving reasoning capabilities. The distilled smaller models also achieve very good performance on benchmarks.

6. Challenges and Limitations

Language Mixing: DeepSeek models can sometimes mix languages in responses.

Prompt Sensitivity: DeepSeek models can be sensitive to prompts, with few-shot prompting sometimes degrading performance.

Software Engineering Tasks: DeepSeek models have not demonstrated a large improvement over other models on software engineering tasks.

DeepSeek represents a significant advancement in AI by developing models that are more efficient and cost-effective while still maintaining high performance. The company's use of a unique approach, mixture of experts and reinforcement learning directly applied to a base model, along with distillation techniques, distinguishes it from conventional methods that rely heavily on supervised learning. This also enables the model to achieve results similar to Chain-of-Thought and self-reinforcement approaches without explicitly relying on them. DeepSeek’s work has not only challenged the status quo in AI development, but has also raised questions about the geopolitical landscape of AI research.

Kenan Causevic

freelancer

2 周

legalpdf.io AI fixes this Emergent AI models evolve autonomously.

Ken Priore

Associate General Counsel at DocuSign - Product and Strategic Partnerships

1 个月

Interesting followup up - https://www.wired.com/story/deepseek-ai-china-privacy-data/

1 次回应

Heitor Carmássio Miranda

Lead Counsel for LATAM @ Docusign | Tech Lawyer | Cloud Computing, Data Privacy & InfoSec Expert

1 个月

Great article, Ken!

Todd Smithline

1 个月

Questions 1 and 2 are fundamental. Now on to the answers!

Abigail T.

Dual Qualified (California and Singapore) | Head of Commercial Legal (APAC) | ACC Global Top 10 30-Somethings Award Winner | CIPP/E and CIPP/A

1 个月

Great post on DeepSeek and only one I’ve read which doesn’t just focus on the financials involved! My sentiments exactly re the trade-offs in cost. While cheaper models may be more appealing, there may be some hidden issues we haven’t considered yet. I mean, for starters, how can we ensure that these models align with ethical standards and guidelines? There are, of course, issues around accountability too. Also, there are questions around data and privacy management. Their privacy policy, as it stands now, is rather broad. After all, DeepSeek can use the data collected to improve its “safety, security and stability", and such data can be shared with a myriad of third party players. So one should probably not use it for any confidential or sensitive purposes (or exercise great caution and not be too trigger happy). That said, I’m curious to know how this space will develop and it is very exciting!

2 次回应

查看更多评论

要查看或添加评论，请登录

Ken Priore的更多文章

Forward Framework: Week 10: Scaling Governance Across Global Teams: A Blueprint for Success

2025年3月11日

Forward Framework: Week 10: Scaling Governance Across Global Teams: A Blueprint for Success

Scaling Governance Across Global Teams: A Blueprint for Success As organizations grow, so do their challenges…
Forward Framework Week 9: Balancing Innovation and Risk: Strategies for High-Stakes Product Decisions

2025年3月4日

Forward Framework Week 9: Balancing Innovation and Risk: Strategies for High-Stakes Product Decisions

Balancing Innovation and Risk: Strategies for High-Stakes Product Decisions Every product counsel will face moments…
The EU AI Regulatory Trinity: GDPR, Prohibited Practices, and the Elusive "AI System" – A Guide for Legal Teams Navigating the New Frontier

2025年2月27日

The EU AI Regulatory Trinity: GDPR, Prohibited Practices, and the Elusive "AI System" – A Guide for Legal Teams Navigating the New Frontier

For legal professionals today, the rise of Artificial Intelligence is more than a technological shift – it's a…
Decoding "AI System": The Surprisingly Nuanced Heart of the EU AI Act

2025年2月26日

Decoding "AI System": The Surprisingly Nuanced Heart of the EU AI Act

In our continuing journey through the landmark EU AI Act (building on our explorations of EDPB guidance and prohibited…

3 条评论
Week 8: Tools for Implementing The Forward Framework

2025年2月25日

Week 8: Tools for Implementing The Forward Framework

Tools for Implementing The Forward Framework Legal teams today are being asked to do more with less. In fast-paced…
Decoding the Red Lines: Unpacking Prohibited AI Practices Under the EU AI Act

2025年2月25日

Decoding the Red Lines: Unpacking Prohibited AI Practices Under the EU AI Act

Following our previous exploration of the EDPB's vital opinion on AI and GDPR, we continue our deep dive into the…
Navigating the AI Maze: EDPB Deciphers GDPR Compliance for AI Models – A Deep Dive

2025年2月24日

Navigating the AI Maze: EDPB Deciphers GDPR Compliance for AI Models – A Deep Dive

The rise of Artificial Intelligence (AI) is transforming industries and reshaping our digital landscape. But as AI's…
Whispering Truth to Power: How OVON's Agentic Framework is Taming LLM Hallucinations

2025年2月20日

Whispering Truth to Power: How OVON's Agentic Framework is Taming LLM Hallucinations

Whispering Truth to Power: How OVON's Agentic Framework is Taming LLM Hallucinations Large Language Models (LLMs) are…

5 条评论
Operationalizing Ethics: Building Products with Transparency, Fairness, and Accountability

2025年2月18日

Operationalizing Ethics: Building Products with Transparency, Fairness, and Accountability

Operationalizing Ethics: Building Products with Transparency, Fairness, and Accountability In today’s rapidly evolving…
The "Copyright Trap" and Generative AI: Are We Regulating Innovation with Yesterday's Rules?

2025年2月17日

The "Copyright Trap" and Generative AI: Are We Regulating Innovation with Yesterday's Rules?

Generative AI. It's reshaping our world faster than many legal frameworks can keep pace, and copyright law is squarely…

7 条评论

See all articles

Why DeepSeek "may" matter

Ken Priore

Associate General Counsel at DocuSign - Product and Strategic Partnerships

领英推荐

Ken Priore的更多文章

社区洞察

其他会员也浏览了

AI and the Evolution of Liquid-cooled Data Centers

Geekbench AI: Benchmarking the Future of AI Performance

The Normal Distribution, Vol. 5

The EMPWR platform: Data and Knowledge-driven Processes for Knowledge Graph Lifecycle

?? Google Releases Transformer 2.0

AI/ML news summary: week 33

AI vs ML what's the difference?

Architect’s Guide To Agentic AI

Unlocking AI’s Best-Kept Secret: How LLM Tools Will Change Everything About Automation

Leveraging AI for Strategic Business Transformation: Technical Insights for Leadership

领英推荐

Ken Priore的更多文章

Forward Framework: Week 10: Scaling Governance Across Global Teams: A Blueprint for Success

Forward Framework Week 9: Balancing Innovation and Risk: Strategies for High-Stakes Product Decisions

The EU AI Regulatory Trinity: GDPR, Prohibited Practices, and the Elusive "AI System" – A Guide for Legal Teams Navigating the New Frontier

Decoding "AI System": The Surprisingly Nuanced Heart of the EU AI Act

Week 8: Tools for Implementing The Forward Framework

Decoding the Red Lines: Unpacking Prohibited AI Practices Under the EU AI Act

Navigating the AI Maze: EDPB Deciphers GDPR Compliance for AI Models – A Deep Dive

Whispering Truth to Power: How OVON's Agentic Framework is Taming LLM Hallucinations

Operationalizing Ethics: Building Products with Transparency, Fairness, and Accountability

The "Copyright Trap" and Generative AI: Are We Regulating Innovation with Yesterday's Rules?

社区洞察

其他会员也浏览了

AI and the Evolution of Liquid-cooled Data Centers

Geekbench AI: Benchmarking the Future of AI Performance

The Normal Distribution, Vol. 5

The EMPWR platform: Data and Knowledge-driven Processes for Knowledge Graph Lifecycle

?? Google Releases Transformer 2.0

AI/ML news summary: week 33

AI vs ML what's the difference?

Architect’s Guide To Agentic AI

Unlocking AI’s Best-Kept Secret: How LLM Tools Will Change Everything About Automation

Leveraging AI for Strategic Business Transformation: Technical Insights for Leadership