A Summary of Summaries of DeepSeek and its implications

A Summary of Summaries of DeepSeek and its implications

First, the tech innovations. How did DeepSeek get pretty close to caught up to leading models for 1/30 of the cost?

Four significant innovations and a bunch of smaller ones. First the biggest ones:

  1. They distilled it from an existing model, most likely Meta’s Llama 3 but it’s possible they used or got access to OpenAI’s 4o or Anthropic’s Claude. Distillation is just the process of using an existing model to train another one. OpenAI does this with models like GPT4-Turbo - it’s distilled from GPT-4 and allows them to offer pretty-good output at dramatically lower cost because the training data is provided by the earlier model. Distilling from OpenAI or Anthropic definitely violates their terms of service, but it’s difficult and maybe impossible to prove. Not needing to generate a starting training set is a huge advantage in speed and cost. ****However, it also means you can’t easily leapfrog the leading models since your starting point is always their previous release.
  2. Instead of one large model, DeepSeek divided its model into what’s called a Mixture of Experts. LLMs have traditionally loaded the entire model during training and inference. DeepSeek used a guided predictive algorithm to determine which parts of the model were used in different queries and only trained those parts. From the thread by @wordgrammer: They need 95% fewer GPUs than Meta because for each token, they only trained 5% of their parameters.
  3. The inference step (where the model makes predictions on new data like your chats) is dramatically cheaper. This is what makes the cost of running DeepSeek, locally or in the cloud, far cheaper than leading models. The breakthrough here, which was announced a little while ago, is compression of the cache the model draws on when making inferences. This is a neat trick! But likely would have been discovered by others in the near term, and in any event the technique is now available to all AI companies because it was published publicly.
  4. Reasoning. DeepSeek didn’t just create a model that competes with the best LLMs from OpenAI and Anthropic, they a reasoning model on par with OpenAI’s o1. Reasoning models leverage LLMs and add chain of thought and other strategies we generally associate with intelligence. They can correct their own mistakes and draw conclusions that predictive models generally can’t. Reasoning models like o1 are created by applying a reinforcement learning layer on top of the LLM. OpenAI used human feedback to guide the model’s decision making during development. DeepSeek’s innovation was to… just not do that. From the Stratechery article: DeepSeek gave the model a set of math, code, and logic questions, and set two reward functions: one for the right answer, and one for the right format that utilized a thinking process. Moreover, the technique was a simple one: instead of trying to evaluate step-by-step (process supervision), or doing a search of all possible answers, DeepSeek encouraged the model to try several different answers at a time and then graded them according to the two reward functions. What emerged is a model that developed reasoning and chains-of-thought on its own.

So what are the implications here? First off it’s a very clear reminder that trying to compete on regulation instead of innovation isn’t the right move. But perhaps more importantly, DeepSeek’s architecture supports a dramatically lower cost model for AI both in dollars and energy consumption. Also important, everything released by DeepSeek (except the underlying model's training data) is open source and permissively-licensed. So this benefits everyone in the industry, even the incumbents.

There are still plenty of viable businesses to be built on top of the tech, and once Product Managers get smarter about incorporating this stuff than just adding a Big Fat AI Button all over their products we’ll start to see some real gains. Models, however, are quickly becoming a commodity, and chipmakers ( okay really just NVIDIA ) do not have as big a moat as we thought a few weeks ago.

Sources:

Threadreader unroll of twitter user @wordgrammer's review of the DeepSeek paper

Stratechery writeup on DeepSeek

要查看或添加评论,请登录

Jim Wrubel的更多文章

  • AI Outtakes, Vol. 80

    AI Outtakes, Vol. 80

    Sleep. Can we ever get enough of it? With Daylight Savings upon us this coming weekend, sadly a lot of us are going to…

    2 条评论
  • AI Outtakes, Vol. 79

    AI Outtakes, Vol. 79

    The team here at AI Outtakes has been hard at work on testing some upgrades to the Orchestra platform, and needed to…

    2 条评论
  • AI Outtakes, Vol. 78

    AI Outtakes, Vol. 78

    This edition is inspired by the TV show Severance, so if you're not current you might want to skip. It's not a spoiler,…

  • AI Outtakes, Vol. 77

    AI Outtakes, Vol. 77

    Fresh off of our well-received edition on out-of-context movie quotes, in this week's edition we're putting movie…

    1 条评论
  • AI Outtakes, Vol. 76

    AI Outtakes, Vol. 76

    We had a blast a few weeks ago in AI Outtakes, Vol. 70 doing the literal readout of the classic poem "A Visit From St.

    8 条评论
  • AI Outtakes, Vol. 75 ??

    AI Outtakes, Vol. 75 ??

    Our apologies in advance - this is going to be a short edition. The staff here at AI Outtakes have been busy with the…

    1 条评论
  • AI Outtakes, Vol. 74 ??

    AI Outtakes, Vol. 74 ??

    It's been (checks notes) three months since we last looked at the state of AI-generated video. Recently, Orchestra AI's…

  • AI Outtakes, Vol. 73 ????

    AI Outtakes, Vol. 73 ????

    A Friend of the Show?? recently went through an unusual and thankfully minor medical issue - he accidentally aspirated…

    2 条评论
  • AI Outtakes, Vol. 72

    AI Outtakes, Vol. 72

    It's 2025, the third (calendar) year of AI Outtakes! What better way to start the season than with a twist on a…

    5 条评论
  • AI Outtakes, Vol. 71 ??

    AI Outtakes, Vol. 71 ??

    Hard to believe this is the second New Year's-themed edition of AI Outtakes! Last year we were still using DALL-E 3 and…

社区洞察

其他会员也浏览了