What does the DeepSeek Sputnik moment mean?
?
DeepSeek is the latest model to come out of a Chinese AI research lab founded in 2023 by Liang Wenfeng and funded by a quant-hedge-fund he co-founded in 2015. According to sources, before sanctions, he had acquired 10,000 NVIDIA A100s. The break-through with their latest model comes from the use of reinforcement learning without the need for human supervised fine-tuning, to achieve OpenAI beating “chain-of-thought-reasoning” (CoT models). CoT differ from conventional large language models in that they first break down requests into a chain of “thoughts” which allows the model to reflect on the answer before it is provided to avoid flawed reasoning and hallucinations.
?
领英推荐
Advantages come in three forms. Firstly, it was trained on under 3 million GPU hours, which equates to just over $5m training cost. For context analysts estimate Meta’s last major AI model costs $60-70m to train. Secondly, we have seen people running the full DeepSeek model on commodity Mac hardware in a usable manner confirming its inferencing efficiency (i.e. using, not training, efficiency). This efficiency translates to hosted versions of this model costing just 5% of the equivalent OpenAI price. And lastly, it is released under the MIT License which is a permissive software license that allows near unlimited freedoms, including modifying it for proprietary commercial use.
?
Impact. The impact of those advantages will be two fold. We think medium-to-longer term, the large-language-model (LLM) infrastructure will go the way of the telco infrastructure; it will become a “commodity technology”. The financial impact on those deploying AI capex today will depend on regulatory interference – which had a major impact on Telcos. If we think of AI as another “tech infra layer”, just as the internet, the mobile and the cloud were, theoretically, the beneficiaries will be those who leverage that infrastructure. While we think of Amazon, Google, and Microsoft as cloud-infrastructure, this immerged out of the need to support their existing biz models; ecommerce, advertising and information-worker software respectively. The LLM infrastructure is different in that, like the railroads and telco infrastructure, these are being built ahead of true product-market fit.
FullStack Developer
1 个月Integrate deepseek in your webapp https://www.fiverr.com/s/LdlpD4L #deepseek