A five minute primer on DeepSeek and Qwen, two Chinese AI models that are sending OpenAI and Anthropic into a tailspin
Okay, so I'm very sure that 99.9% of the people reading this article, data scientist or not, have heard of DeepSeek, heck some of you maybe even lost money this week because of it. At the same time, there's a ton of confusion about DeepSeek and more information is coming out daily about this breakthrough AI company, and yes, it is absolutely a breakthrough, but not necessarily in all the ways the media is explaining it.
At the same time, there's another AI model that maybe only 50% of you have heard of, but let me tell you, you'll be heard a lot more about it over the next week. What I'm talking about is Qwen, a new foundational LLM from Alibaba.
While there's a lot of technical white papers you can do a deep dive into, I thought it would be interesting to put together a quick primer for everyone who might not have the time to do that, but still wants to understand what's going on.
First things first, this is a real, likely world-changing moment in time. While there certainly is a lot of media hype, with lots of people getting things completely wrong, don't let that make you think for a second that major innovation isn't taking place - because it is. So let's dive in, my plan is to give you about two minutes of reading per model so you can have a solid understanding of what makes DeepSeek and Qwen so interesting, and why OpenAI is going to likely release o3, their newest reasoning model today or tomorrow because of them.
DeepSeek R1
Okay so DeepSeek has not only dominated the news cycle globally for the last week or so, it also caused an absolutely massive panic in the US stock market with companies like NVidia losing over $600M in value in the blink of an eye.
So why did DeepSeek cause so much panic? The company claimed that it spent less than $6M training their new R1 reasoning model, which benchmarked on part with OpenAI's o1 model, and didn't use any fancy NVidia chips. Compare this to the billions of dollars that companies like OpenAI and Anthropic spent building their models and people asked a very important question - "are companies not going to need these $30,000 chips from companies like NVidia to build powerful LLMs?"
Now this specific point is hotly contested. Yesterday, Brad Gerstner, the founder of Altimeter Capital, went on record with CNBC saying that apples to apples, OpenAI spent maybe $15M to train o1, and that was 10-12 months ago, costs have come down by 50%+ since then, so training for under $6M actually just tracks, it isn't shocking at all.
You can see the whole tweet and watch an excerpt from Brad's interview with CNBC here.
But, on top of the training costs, DeepSeek decided to make their model Open Source which means anyone can download and modify the source code. OpenAI, while having the word "open" in their company name, only produces closed-source models, which means, no - you can't download their source code and make changes to it, resell it, etc.
When you go down a little deeper though, what makes DeepSeek so interesting is their novel new approach to training a model, costs aside. At the core of this is their approach to Data Annotation.
领英推荐
The TLDR; here is that DeepSeek put a lot of focus on high-quality data annotation, which fundamentally changed the efficiency of the model. Alexandr Wang, the CEO of Scale AI has a great tweet where he breaks this down even further, but it's the first part of the tweet that I think really gets to the core of what makes DeepSeek so different, and special.
The three key words here are: human-generated data. And yes, DeepSeek is setting records for the amount of post-training data for open source models. I'm going to leave it here for now since this is a quick primer, but if you want to do a deeper dive, I would start with the tweet from Alexandr that I shared above, you can check it out at this link.
Qwen 2.5
Now let's talk Qwen, a model many people reading this, at least today, might not have heard of. So I'm assuming you've heard of Alibaba, one of the biggest companies in China that did over $130B in revenue in 2024. If not, Perplexity it, and then come back here.
While you might think the AI model war is pitting the US against China, it's also pitting Chinese companies against other Chinese companies, and today Alibaba fired back at DeepSeek with a new reasoning model of it's own - Qwen 2.5 which it claims is even better than DeepSeek R1.
And of course, a picture is worth a thousand words so here's the data comparing the two models:
As you can see, Qwen 2.5 looks like it's the new leader of the pack with better scores across-the-board than comparable models. The timing of the release also shows just how much pressure is on foundational modeling companies to release new models. Qwen 2.5 was announced on the first day of the Chinese Lunar New Year, a day that virtually everyone in China is off work, not the time you'd typically launch a product.
But the race is on and Alibaba clearly wants to show the world that it can out-innovate DeepSeek. As for what makes Qwen special, here's the dets directly from Alibaba:
It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence. However, the research and industry community has limited experience in effectively scaling extremely large models, whether they are dense or Mixture-of-Expert (MoE) models. Many critical details regarding this scaling process were only disclosed with the recent release of DeepSeek V3. Concurrently, we are developing Qwen2.5-Max, a large-scale MoE model that has been pretrained on over 20 trillion tokens and further post-trained with curated Supervised Fine-Tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF) methodologies. (Source - https://qwenlm.github.io/blog/qwen2.5-max/)
Since Qwen is so new, it was literally just announced two days ago, there isn't as much analysis on it as DeepSeek, but there will be soon and I'll be diving into it and likely putting together another article with more analysis.
That being said, I think it is incredibly likely that OpenAI releases o3, the next evolution of their reasoning model, and I wouldn't be shocked if they announce it today. And yes, you can expect o3 to beat DeepSeek R1, this is a race so everyone is going to one up each other with every release.
I hope this article was helpful, it's my first time writing an article on LinkedIn so I don't know much about how to promote this, but if you do, please help amplify as I think this is something everyone should know more about. Thanks for reading!
retired at myself
1 个月Stargate will be located here in Abilene, Texas don,t know how this will turn out
Data Analyst | Deep Learning Researcher @Metropolitan College | Research Assistant @Global Development Policy Center
1 个月Hey Morgan! This was very interesting!
AI custom development | Ambassador at 044.ai | Empowering businesses with intelligent AI
1 个月Hey Morgan, let's connect!
B2B SaaS Product Marketing | GTM Strategies | Leveraging AI to Work Smarter and Faster
1 个月Morgan L. Nice overview yesterday comparing DeepSeek AI and Qwen. And nice prediction, just confirmed, about OpenAI releasing o3 today! BTW, your metrics chart showing which LLMs rate higher in different categories is compelling for Qwen, but for people who want to see tangible "proof" that it's better, I suggest taking a look at the 17 side by side examples of the output it produces for common use cases compared to leading LLMs in an X post by Bishal Nandi. At first glance, Qwen does look impressive. https://x.com/LearnWithBishal/status/1885346365095526506