登录查看更多内容

A Summary of Summaries of DeepSeek and its implications

Jim Wrubel

发布日期: 2025年1月28日

+ 关注

First, the tech innovations. How did DeepSeek get pretty close to caught up to leading models for 1/30 of the cost?

Four significant innovations and a bunch of smaller ones. First the biggest ones:

They distilled it from an existing model, most likely Meta’s Llama 3 but it’s possible they used or got access to OpenAI’s 4o or Anthropic’s Claude. Distillation is just the process of using an existing model to train another one. OpenAI does this with models like GPT4-Turbo - it’s distilled from GPT-4 and allows them to offer pretty-good output at dramatically lower cost because the training data is provided by the earlier model. Distilling from OpenAI or Anthropic definitely violates their terms of service, but it’s difficult and maybe impossible to prove. Not needing to generate a starting training set is a huge advantage in speed and cost. ****However, it also means you can’t easily leapfrog the leading models since your starting point is always their previous release.
Instead of one large model, DeepSeek divided its model into what’s called a Mixture of Experts. LLMs have traditionally loaded the entire model during training and inference. DeepSeek used a guided predictive algorithm to determine which parts of the model were used in different queries and only trained those parts. From the thread by @wordgrammer: They need 95% fewer GPUs than Meta because for each token, they only trained 5% of their parameters.
The inference step (where the model makes predictions on new data like your chats) is dramatically cheaper. This is what makes the cost of running DeepSeek, locally or in the cloud, far cheaper than leading models. The breakthrough here, which was announced a little while ago, is compression of the cache the model draws on when making inferences. This is a neat trick! But likely would have been discovered by others in the near term, and in any event the technique is now available to all AI companies because it was published publicly.
Reasoning. DeepSeek didn’t just create a model that competes with the best LLMs from OpenAI and Anthropic, they a reasoning model on par with OpenAI’s o1. Reasoning models leverage LLMs and add chain of thought and other strategies we generally associate with intelligence. They can correct their own mistakes and draw conclusions that predictive models generally can’t. Reasoning models like o1 are created by applying a reinforcement learning layer on top of the LLM. OpenAI used human feedback to guide the model’s decision making during development. DeepSeek’s innovation was to… just not do that. From the Stratechery article: DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers, DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions. What emerged is a model that developed reasoning and chains-of-thought on its own.

So what are the implications here? First off it’s a very clear reminder that trying to compete on regulation instead of innovation isn’t the right move. But perhaps more importantly, DeepSeek’s architecture supports a dramatically lower cost model for AI both in dollars and energy consumption. Also important, everything released by DeepSeek (except the underlying model's training data) is open source and permissively-licensed. So this benefits everyone in the industry, even the incumbents.

领英推荐

TAI #137: DeepSeek r1 Ignites Debate: Efficiency vs…

Towards AI 1 个月前

This AI newsletter is all you need #89

Towards AI 1 年前

This AI newsletter is all you need #43

Towards AI 1 年前

There are still plenty of viable businesses to be built on top of the tech, and once Product Managers get smarter about incorporating this stuff than just adding a Big Fat AI Button all over their products we’ll start to see some real gains. Models, however, are quickly becoming a commodity, and chipmakers ( okay really just NVIDIA ) do not have as big a moat as we thought a few weeks ago.

Sources:

Threadreader unroll of twitter user @wordgrammer's review of the DeepSeek paper

Stratechery writeup on DeepSeek

Zion Melson

Find me on Discord ???

1 个月

https://discord.gg/learnmutiny

要查看或添加评论，请登录

Jim Wrubel的更多文章

AI Outtakes, Vol. 80

2025年3月3日

AI Outtakes, Vol. 80

Sleep. Can we ever get enough of it? With Daylight Savings upon us this coming weekend, sadly a lot of us are going to…

2 条评论
AI Outtakes, Vol. 79

2025年2月24日

AI Outtakes, Vol. 79

The team here at AI Outtakes has been hard at work on testing some upgrades to the Orchestra platform, and needed to…

2 条评论
AI Outtakes, Vol. 78

2025年2月17日

AI Outtakes, Vol. 78

This edition is inspired by the TV show Severance, so if you're not current you might want to skip. It's not a spoiler,…
AI Outtakes, Vol. 77

2025年2月10日

AI Outtakes, Vol. 77

Fresh off of our well-received edition on out-of-context movie quotes, in this week's edition we're putting movie…

1 条评论
AI Outtakes, Vol. 76

2025年2月3日

AI Outtakes, Vol. 76

We had a blast a few weeks ago in AI Outtakes, Vol. 70 doing the literal readout of the classic poem "A Visit From St.

8 条评论
AI Outtakes, Vol. 75 ??

2025年1月27日

AI Outtakes, Vol. 75 ??

Our apologies in advance - this is going to be a short edition. The staff here at AI Outtakes have been busy with the…

1 条评论
AI Outtakes, Vol. 74 ??

2025年1月20日

AI Outtakes, Vol. 74 ??

It's been (checks notes) three months since we last looked at the state of AI-generated video. Recently, Orchestra AI's…
AI Outtakes, Vol. 73 ????

2025年1月13日

AI Outtakes, Vol. 73 ????

A Friend of the Show?? recently went through an unusual and thankfully minor medical issue - he accidentally aspirated…

2 条评论
AI Outtakes, Vol. 72

2025年1月6日

AI Outtakes, Vol. 72

It's 2025, the third (calendar) year of AI Outtakes! What better way to start the season than with a twist on a…

5 条评论
AI Outtakes, Vol. 71 ??

2024年12月30日

AI Outtakes, Vol. 71 ??

Hard to believe this is the second New Year's-themed edition of AI Outtakes! Last year we were still using DALL-E 3 and…

See all articles

A Summary of Summaries of DeepSeek and its implications

Jim Wrubel

领英推荐

Jim Wrubel的更多文章

社区洞察

其他会员也浏览了

What if AGI happens and nobody notices?

OpenAI-mer

Mastering MLOps practices for a trading bot

DeepSeek's R1 Disrupting America's AI Business Model

???? “Hallucination Index” Ranks LLMs for Popular AI Use Cases

Mysterious GPT is Back...

AI News & Insights: Musk Reveals AI Company, Altman Critical About AI Model Growth & Microsoft’s Move into Chip Development

Can DeepSeek R1 Challenge OpenAI o1? Benchmarks Say Ye

Google, OpenAI, DeepSeek Battle it out in AI

?? Navigating Tech's Ever-Changing Horizon (Well, the first month of 2024)

领英推荐

Jim Wrubel的更多文章

AI Outtakes, Vol. 80

AI Outtakes, Vol. 79

AI Outtakes, Vol. 78

AI Outtakes, Vol. 77

AI Outtakes, Vol. 76

AI Outtakes, Vol. 75 ??

AI Outtakes, Vol. 74 ??

AI Outtakes, Vol. 73 ????

AI Outtakes, Vol. 72

AI Outtakes, Vol. 71 ??

社区洞察

其他会员也浏览了

What if AGI happens and nobody notices?

OpenAI-mer

Mastering MLOps practices for a trading bot

DeepSeek's R1 Disrupting America's AI Business Model

???? “Hallucination Index” Ranks LLMs for Popular AI Use Cases

Mysterious GPT is Back...

AI News & Insights: Musk Reveals AI Company, Altman Critical About AI Model Growth & Microsoft’s Move into Chip Development

Can DeepSeek R1 Challenge OpenAI o1? Benchmarks Say Ye

Google, OpenAI, DeepSeek Battle it out in AI

?? Navigating Tech's Ever-Changing Horizon (Well, the first month of 2024)