What does the DeepSeek Sputnik moment mean?

What does the DeepSeek Sputnik moment mean?

  1. DeepSeek’s AI model lowers the cost of AI training and usage by >90%, unlocking “Jevons Paradox”, which we think will make AI more pervasive.
  2. It uses reinforcement learning “RL” to enhance the efficacy of “chain-of-thought-reasoning”, demonstrating there is still algorithmic progress to be had in AI.
  3. To us, this is similar to prior infrastructure capex bubbles. AI commoditisation will accrue value to companies yet to be built as in the past e.g. Standard Oil (off Railways) and Facebook (off Telco Infra).

?

DeepSeek is the latest model to come out of a Chinese AI research lab founded in 2023 by Liang Wenfeng and funded by a quant-hedge-fund he co-founded in 2015. According to sources, before sanctions, he had acquired 10,000 NVIDIA A100s. The break-through with their latest model comes from the use of reinforcement learning without the need for human supervised fine-tuning, to achieve OpenAI beating “chain-of-thought-reasoning” (CoT models). CoT differ from conventional large language models in that they first break down requests into a chain of “thoughts” which allows the model to reflect on the answer before it is provided to avoid flawed reasoning and hallucinations.

?

Advantages come in three forms. Firstly, it was trained on under 3 million GPU hours, which equates to just over $5m training cost. For context analysts estimate Meta’s last major AI model costs $60-70m to train. Secondly, we have seen people running the full DeepSeek model on commodity Mac hardware in a usable manner confirming its inferencing efficiency (i.e. using, not training, efficiency). This efficiency translates to hosted versions of this model costing just 5% of the equivalent OpenAI price. And lastly, it is released under the MIT License which is a permissive software license that allows near unlimited freedoms, including modifying it for proprietary commercial use.

?

Impact. The impact of those advantages will be two fold. We think medium-to-longer term, the large-language-model (LLM) infrastructure will go the way of the telco infrastructure; it will become a “commodity technology”. The financial impact on those deploying AI capex today will depend on regulatory interference – which had a major impact on Telcos. If we think of AI as another “tech infra layer”, just as the internet, the mobile and the cloud were, theoretically, the beneficiaries will be those who leverage that infrastructure. While we think of Amazon, Google, and Microsoft as cloud-infrastructure, this immerged out of the need to support their existing biz models; ecommerce, advertising and information-worker software respectively. The LLM infrastructure is different in that, like the railroads and telco infrastructure, these are being built ahead of true product-market fit.

Usman Nawaz

FullStack Developer

1 个月

Integrate deepseek in your webapp https://www.fiverr.com/s/LdlpD4L #deepseek

回复

要查看或添加评论,请登录

Peel Hunt的更多文章

社区洞察

其他会员也浏览了