Is DeepSeek Really a Watershed Moment?
Randy Schwartz
Data Design Lead, AI Co-Lead... Growth Marketer and Digital Strategist
What is DeepSeek?
The Internet is abuzz with a new open-source AI model called DeepSeek. DeepSeek released R1 last week, itself an enhanced version of their December’s V3 drop, that uses reinforcement learning to improve performance on math, logic, and reasoning questions. The model performs on par with leading frontier models like OpenAI and Anthropic, but was somehow developed in just 2 months for only $5.6M. It’s also 10x cheaper and uses 95% less computing power. The prominent VC Marc Andreesen referred to it as AI’s “Sputnik moment” and the story spooked investors such that on Monday, the stock market saw a $1 trillion wipeout (concentrated to the high-tech industry).
The fact that Monday’s stock market swing may have actually been caused by the Bank of Japan raising their 10-year interest rate to its highest in 17 years... is beside the point. We build narratives every day, and perception is everything. This is a somewhat narrative-based story, led by the fact that DeepSeek was built by the Chinese and the US is in an AI arms race with China. That this could be a bellwether for China’s success is a pointed question, given the US’s practice of restricting semiconductor exports to specifically curb Chinese AI innovation. So, on one hand, we’ve denied the our most essential IP for building AI technology (Nvidia H100 GPU chips). And on the other hand, they’ve still matched our best frontier models using older, lesser chips (Nvidia H800s). This could be the most interesting part of the story from a business perspective. Their imposed processing poverty is what forced the Chinese to challenge the orthodoxy of AI. Meaning, DeepSeek took a very different approach to training its models because it had to.
How they did this?
It’ll take AI researchers months to unpack the 2 models and the implications of their achievements (Meta’s said to have already set-up 4 war rooms to analyze these models to improve their own Llama models). But in terms of how DeepSeek approached building a better mousetrap, its topline design seems to include the following factors:
(**skip ahead if you’re not into the technicals)
Mixture-of-Experts (MOE) Architecture
Mixed-Precision Training Framework
Multi-Token Prediction System
Chain of Thought (COT) Reasoning?Models
Fact or Fiction
We know how DeepSeek works because they’ve released a research paper on its performance, as well as the code itself. But the facts do need to be checked. China’s exporting not just a technology with DeepSeek, but a narrative, and China is the first-world power equivalent of what Edgar Allen Poe called the “unreliable narrator”. For example, was DeepSeek really created by 200 dedicated engineers over just 2 months -- and for only $5.6M? We may never know. We’re also told that $5.6M investment was bankrolled by a hedge fund named High-Flyer. It doesn’t really make sense that a hedge fund would attempt to build a foundational AI model. If this project was to serve a business, wouldn't they instead fine-tune some established models to serve hedge fund use cases?
There are 2 sides of AI, the training of the model and the inference where it’s used. The model is a function of the data that’s used to train it, because that’s where the model builds its values for the inference. Herein lies another problem with this product. DeepSeek says their models are based on open-source models like PyTorch and Llama, which are well-known and would have experts accept the models’ training at face-value. But OpenAI has strong evidence that DeepSeek also used synthetic data from OpenAI’s models, GPT 4’s outputs, to train itself. In machine learning this process is known as “distillation”, but in lawful society it’s called IP theft. So, they seem to be in breach of OpenAI’s terms of service, which certainly dampens the pure innovation of the moment. And that’s to say nothing about the consequence of training an AI model with the outputs of another AI model, but not considering the foundational logic that shaped those outputs. Anyone who’s seen the movie Multiplicity would have reservations about making a copy of a copy.
领英推荐
Is DeepSeek commercially viable?
What seems clear is that DeepSeek is not a commercially viable product for companies to build on, for many reasons:
1. Unfavorable Terms of Service
DeepSeek’s terms of service state that the company retains rights over the content that users submit (which DeepSeek can then modify, publish or license for themselves), and they also claim ownership to their model’s Gen-AI outputs. Which means corporate users wouldn’t be able to own or brand whatever content they create through DeepSeek’s tools.
2. Privacy and Security
Then you have significant security concerns, as DeepSeek will store your prompts and use them as training to reinforce its models. OpenAI doesn’t store the users prompting if they use the API, and they don’t see your data if you license it within a private cloud environment. These concessions are not available with DeepSeek products, and are where many enterprise considerations will die on the vine.
3. Lack of Legal Indemnification
The more likely an AI was trained using an author’s work, the more likely its output will look like that data. Companies can’t be left open to claims of copyright infringement from all angles, which is why so many are reluctant to use Gen AI to create their final commercial outputs. We know that DeepSeek will never reveal their training set, so corporations could face looming lawsuits for a generation to come.
4. IP Infringement
If DeepSeek is in violation of OpenAI’s terms of service, there’s a good chance OpenAI will pursue in the courts, which DeepSeek will stonewall, or the US government will more expediently shut down domestic access to DeepSeek’s technologies. So why would a corporation integrate DeepSeek models into their AI workbench, or build anything atop these models, if they could get shut down any minute.
5. Chinese Nationalist Interests
DeepSeek was created through the lens of the CCP’s interests, which means it’s presumably lacking an objective and comprehensive training set. All ingested contents and processed outputs are managed in a way that follows Chinese laws and socialist core values, with the aim of “protecting national security and social stability” (the government talks like this). This makes sense for the communist party, but won’t work for any corporation out there.
6. Trojan Horse
Moving from self-interest to more nefarious intent, there are suggestions that AI-generated code coming out of DeepSeek includes suspicious references to Chinese libraries containing potential exploits. This poses security risks of the highest order.
Endgame
We don’t know what High-Flyer’s intentions were with this model, or how the CCP’s going to annex this technology (if they haven’t already). But given its conflicts and complications, which could not be more overt, we might consider that maybe DeepSeek was never meant to be a commercial enterprise at all. It’s so clearly unfit for commercial use that one could either see DeepSeek as a subversive platform for China to harvest data from other countries, or a benign research exercise where a private group was experimenting with cutting edge technology and accelerated their learnings through open-source models.
Based on the nationalistic tension, this development will surely give US politicians all the validation and urgency they need for their proposed $500B Stargate project. But from another point of view, DeepSeek is proof that we can’t win the AI race simply by stifling competition. We’ve tried that by putting successive waves of restrictions on our chip exports, and it clearly didn’t work. And to that point, one could say the DeepSeek story isn’t at all about an AI arms race between 2 superpowers, but rather a living treatise on the virtues of open-source vs closed-source software models and humanity’s path to innovation in general. Openness and transparency foster collaboration and accelerate innovation. By democratizing access to emerging research and bleeding edge technology, we can pull the future forwards.
Just 2 years ago it was commonly thought we wouldn’t reach Artificial General Intelligence (AGI) until 2034, optimistically speaking. Now many experts think we could get there by the end of the year. Viewing DeepSeek through that lens could position their new model as a sign of the times, not for an AI arms race but for open-source innovation. The true learnings here might be that we’ll have to meet this moment in the rapidly accelerating evolution of a species head-on, not only with investment but through broad collaboration.
Looking forward to this take! Always good to dig deeper into the narratives shaping the conversation. ????
Growth | Strategy | Operations | Performance Media | Digital
3 周Great write-up, Randy Schwartz.
Vice President of Client Engagement | Executive Search for Sales, Marketing, Tech, Finance, and Operations
3 周All great takes, Randy!
It sounds like you've been diving into the news cycle around DeepSeek and are now forming your own perspective on the narrative. A "hot take" typically refers to a bold or provocative opinion on a current issue, so if you're offering one, you might be challenging the mainstream view or offering a new angle on the topic.
Executive Marketing Leader & Stratemai Founder: Digital Tech, Media, Marketing, Data & Ai
4 周Nice write up Randy. Seems a Frugal/AI revolution Is here. DeepSeek just built a GPT-4 competitor for less than the cost of a Super Bowl ad. Is this exposing AI price-bloat in the west and a concerning economic vulnerability? If AI can be this decent for this cost, the real disruption isn’t intelligence, it’s a massive efficiency unlock.?