登录查看更多内容

Qwen and DeepSeek Lead AI Wave as OpenAI and Anthropic Falter

Ayush Gupta

Customizing LLMs with ? Genloop | ? Apple, ??Stanford University

发布日期: 2025年3月6日

+ 关注

Fine-Tuned by Genloop - #5

Dear Readers,

Welcome to Edition 5 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. This edition is packed with exciting model releases that are reshaping the AI landscape. We dive into Qwen's impressive QwQ-32B reasoning model that is 20x smaller than DeepSeek R1 with comparable results, DeepSeek's infrastructure innovations and upcoming R2 model, Microsoft's edge-friendly Phi-4 series, Anthropic's contentious Claude 3.7, and OpenAI's underwhelming GPT-4.5 launch.

Notably absent from the recent flurry of releases? Meta's Llama models which remain the most widely used foundation for enterprise domain intelligence. While they're making incremental improvements with reinforcement learning (SWE-RL paper, covered in Research Section), we're all waiting for Meta to make their next big move!

We're thrilled to announce the launch of our LLM Research Hub, where you can discover curated research papers ranked by popularity, relevance, and quality from multiple sources. We've also kicked off a weekly LLM Research Reading Group to collectively explore groundbreaking research – fill up the form if you're interested! (more about it in Genloop updates below)

Let's dive in!

?? AI Industry Highlights

Qwen Releases QwQ-32B Reasoning Model to Rival Industry Leaders

Qwen has launched QwQ-32B, a medium-sized reasoning model that achieves competitive performance against larger models like DeepSeek-R1 671B and outperforms OpenAI's o1-mini on several benchmarks despite having significantly fewer parameters.

Key points:

Efficient Architecture: Dense 32.5B parameter model (31.0B non-embedding) with 131k token context length, 20x smaller than DeepSeek R1!
Dual Training: Combines both Supervised Fine-Tuning (SFT) and Reinforcement Learning
Open Access: Available via Hugging Face and QwenChat with comprehensive documentation

This is just the medium model, their max model is still in development. Open-source is really pulling ahead in this race!

Satya Nadella Grounds AI Hype with Economic Reality Check

In a surprising development, Microsoft CEO Satya Nadella has taken a pragmatic stance on AI's potential impact, focusing on economic benchmarks rather than superintelligence hype. This is quite contrary to OpenAI’s narrative. He suggests that the true measure of AI's success should be its contribution to GDP growth.

Key points:

Economic Benchmark: For AI to match the Industrial Revolution's impact, we'd need to see GDP growth jump from the current 2% to 7-10% in developed economies. That should translate to an additional $10 trillion in economic value annually
No Single Winner: Nadella also?explained why?he believes that no single company will dominate AI through one superior model, as open-source competition will prevent a "winner-take-all" scenario

Nadella directly challenges OpenAI's "universe value capture" investment thesis.

Claude 3.7 Sonnet Receives Mixed Reviews from Developers

Anthropic has released Claude 3.7 Sonnet, their most advanced model to date and the first "hybrid reasoning model" with extended thinking capabilities. While it is impressive on paper, the reception among developers has been surprisingly mixed, particularly regarding its coding abilities.

Key points:

Extended Thinking Mode: Claude 3.7 can show its reasoning process before answering, with user control over thinking depth
Claude Code Tool: A new command-line tool allowing AI to search codebases, edit files, run tests, and push to GitHub
User Complaints: Power users report issues with over-designing (400-line scripts ballooning to 1,100 lines), selective hearing of instructions, personality downgrade, and poor handling of complex projects
Adaptation Required: Successful users report better results by front-loading detailed instructions rather than conversational back-and-forth, being extremely specific about constraints, and starting fresh projects instead of continuing existing work

While this trend suggests advanced models may be trading collaborative flexibility for autonomous efficiency, having a leading closed-source model disappoint users raises concerns about proprietary solutions. This serves as a reminder to those who believe general-purpose LLMs alone can solve Enterprise AI—domain intelligence remains essential for extracting real value from Generative AI.

GPT-4.5 Launch Underwhelms as Economics of AI Scaling Questioned

OpenAI released GPT-4.5, positioning it for "emotional intelligence" rather than reasoning power. The reception has been decidedly mixed, more leaning on the negative side.

Key points:

Marginal Improvements: Former OpenAI researcher Andrej Karpathy noted it required 10X more compute for "diffuse" improvements
Prohibitive Pricing: At $75/input and $150/output per million tokens, GPT-4.5 costs 10-25X more than competitors
Economic Reality Check: With an estimated $500M training cost and plans to spend significantly more in 2025, the economics are increasingly questionable
Scaling Concerns: AI researcher Gary Marcus called it evidence that "scaling data and compute is not a physical law."

GPT-4.5 encapsulates the AI industry's current dilemma: delivering technological achievements that can't justify their astronomical costs. As one observer summarized: "Half the TL saying it's bad and too expensive. Half the TL saying it's good and too expensive." With Sam Altman declaring they're "out of GPUs," this release indicates the limits of scaling laws and price drops.

DeepSeek Releases AI Infra Effort, Races for R2 Model

DeepSeek has open-sourced some of their impressive infrastructure optimizations like the 3FS file system that gives 6.6 TiB/s throughput in a 180-node storage cluster. Reports also suggest it is rapidly approaching the R2 model.

Microsoft Unveils Phi-4 Small Language Models for Multimodal AI

Microsoft released Phi-4-multimodal (5.6B size) and Phi-4-mini (3.8B size), compact language models designed for edge deployments. The 5.6B parameter model can handle speech, vision, and text simultaneously, while the 3.8B parameter mini-model focuses on text-based tasks like reasoning, math, and coding. Read the release blog here.

??Genloop Updates: Introducing the LLM Research Hub

We're excited to announce the launch of our LLM Research Hub – a powerful tool designed to keep you at the cutting edge of language model research without the overwhelming information overload!

Keeping up with the latest LLM research has become increasingly challenging, with new papers published daily across multiple platforms. To solve this problem, we built an internal agentic workflow that automatically:

Gathers research papers from multiple authoritative sources
Ranks them based on relevance, quality, and potential impact
Curates the most significant findings for easy consumption

What began as an internal tool has now been opened to everyone!

领英推荐

The Cold War of AI Image Generators

Arbisoft 7 个月前

Open Letter To Pause AI Experiments Makes ‘ZERO’ Sense

AIM 1 年前

The Most Historic AI Week of 2024

Bhasker Gupta 1 年前

We've also started a weekly paper reading group where we collectively dive deep into the most impactful research. If you'd like to join these sessions and stay at the forefront of LLM advancements, sign up using this form.

?? Featured Blog Posts

We've got two fascinating reads that showcase how the AI landscape is evolving:

AI is rewriting our world - literally

Research reveals that AI is transforming written communication at an unprecedented pace. By late 2024:

18% of financial complaints show AI assistance
24% of corporate press releases involve AI writing
15% of small company job postings demonstrate AI influence

After ChatGPT's release, a brief adoption lag was followed by explosive integration across communication domains. Detection remains challenging, with sophisticated AI outputs often escaping current screening methods.

The implications are clear: AI is quietly but fundamentally reshaping how we write and communicate professionally.

GPT-4.5 is a nothing burger!

Despite significant anticipation, OpenAI's GPT-4.5 release has been described as underwhelming by many in the AI community. As Gary Marcus and others have pointed out, this release reveals several important trends in the current AI landscape.

Key points:

Diminishing Returns from Scaling: Traditional model scaling approaches are showing reduced effectiveness in improving performance
Competitive Landscape Shift: OpenAI is rapidly losing its dominant edge
Price-Performance Concerns: GPT-4.5 is 15x pricier than GPT-4o and 25x more expensive than Claude 3.7 Sonnet while offering similar or worse performance
Quality Inconsistency: Not every OpenAI release is delivering breakthrough capabilities

Read Gary Marcus's analysis

?? Research Corner

Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:

SWE-RL: Applying RL-Based Reasoning to Software Engineering

Meta AI's SWE-RL paper applies a GPRO-like approach (similar to DeepSeek R1's methodology) to fine-tune Llama 3 using open-source software evolution data. The result is a model that develops autonomous reasoning processes similar to those of experienced developers.

Key highlights:

Rule-Based Rewards: The model trains using reinforcement learning with rewards based on how well AI-generated fixes match verified software patches
Competitive Performance: Llama3-SWE-RL-70B solves 41.0% of issues on SWE-bench, matching much larger proprietary models like GPT-4o and outperforming other <100B models
Generalized Reasoning Benefits: Although focused on software tasks, the model unexpectedly improved at general reasoning, outperforming supervised fine-tuned baselines on five unrelated tasks

Read our TuesdayPaperThoughts analysis

NSA: Efficient Long-Context Modeling from DeepSeek AI

DeepSeek AI's NSA (Natively Sparse Attention) paper introduces a novel sparse attention mechanism designed for efficient processing of long sequences. This approach addresses one of the most significant challenges in scaling next-generation LLMs by reducing the computational burden of attention mechanisms.

Key highlights:

Hierarchical Sparse Strategy: NSA dynamically compresses and selects tokens, combining coarse-grained compression with fine-grained retention to maintain global context while reducing computation
Impressive Performance: The model matches or exceeds Full Attention baselines while achieving 11.6x, 9x, and 6x speedups in decoding, forward, and backward propagation, respectively
Hardware-Optimized Design: Unlike other sparse methods, NSA is designed for modern accelerators like A100 GPUs and can be trained end-to-end, reducing pretraining costs

With models now routinely handling 64k+ tokens, approaches like NSA become critical for making long-context reasoning both practical and efficient.

Read our TuesdayPaperThoughts analysis

Looking Forward

Open source is advancing rapidly, democratizing access to AI intelligence. While recent weeks have seen more breakthroughs coming from China than the US, this dynamic may shift soon—there are high expectations for Meta. DeepSeek's success has certainly given them a wake-up call.

This surge in competition has been transformative for open-source LLM development. We now have multiple specialized models—both reasoning and non-reasoning—that can outperform general-purpose LLMs. In fact, our own experiments with customized reasoning models are showing remarkable results, with performance improvements of 200% compared to leading general-purpose models like GPT4o. We'll be sharing these experimental findings in the coming weeks.

If you'd like to join our exclusive LLM Research Reading group, please sign up here. All papers that we'll be discussing will be available on our research hub.

About Genloop

Genloop delivers customized LLMs that provide unmatched cost, control, simplicity, and performance for production enterprise applications. Please visit genloop.ai, catch us on Linkedin, or email [email protected] for more details.

Stay curious,

The Genloop Team

Fine-Tuned by Genloop

481 位关注者

Shekhar Patil

Head of Partnerships and Alliances @ Last9

3 周

I am not even keeping up with your articles. How do you keep up with so many AI updates ??

1 次回应

Ayush Gupta

Customizing LLMs with ? Genloop | ? Apple, ??Stanford University

3 周

Miten N Mehta like we were discussing.

Ayush Gupta

Customizing LLMs with ? Genloop | ? Apple, ??Stanford University

3 周

Form for joining paper reading group: https://forms.gle/pisk1ss1wdzxkPhi9 LLM Research Hub: https://genloop.ai/research-hub

查看更多评论

要查看或添加评论，请登录

Ayush Gupta的更多文章

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

2025年3月23日

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

Fine-Tuned by Genloop - #6 Dear Readers, Welcome to Edition 6 of Fine-Tuned by Genloop – your go-to guide for the…

1 条评论
xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

2025年2月21日

xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

Fine-Tuned by Genloop - #4 Dear Readers, Welcome to Edition 4 of Fine-Tuned by Genloop – your go-to guide for the…
DeepSeek’s Impact Reshapes AI, Markets, and Global Power

2025年2月6日

DeepSeek’s Impact Reshapes AI, Markets, and Global Power

Fine-Tuned by Genloop - #3 Dear Readers, Welcome to Edition 3 of Fine-Tuned by Genloop – your guide to the evolving…
OpenAI’s Stargate Bet while DeepSeek R1 Closes the Gap

2025年1月24日

OpenAI’s Stargate Bet while DeepSeek R1 Closes the Gap

Fine-Tuned by Genloop - # 2 Dear Readers, Welcome to Edition 2 of Fine-Tuned by Genloop – your guide to the evolving…
US AI Export Restrictions Will Push the Rise of Domain Memory Agents

2025年1月17日

US AI Export Restrictions Will Push the Rise of Domain Memory Agents

The United States' latest move to expand AI export restrictions on closed general models and high-performance computing…
The Evolution of LLM Fine-Tuning and Customization in 2024

2025年1月9日

The Evolution of LLM Fine-Tuning and Customization in 2024

Welcome to the first edition of Fine-Tuned by Genloop – your guide to the evolving world of LLM customization. 2024 was…

1 条评论
Should You Fine-Tune Your LLM? A Data-Driven Framework and Evaluation Tool

2025年1月2日

Should You Fine-Tune Your LLM? A Data-Driven Framework and Evaluation Tool

Useful Links: Try the LLM Fine-Tuning Evaluator Detailed Guide on LLM Fine-Tuning The decision to fine-tune a Large…

3 条评论

See all articles

Fine-Tuned by Genloop - #5

?? AI Industry Highlights

Qwen Releases QwQ-32B Reasoning Model to Rival Industry Leaders

Key points:

Satya Nadella Grounds AI Hype with Economic Reality Check

Key points:

Claude 3.7 Sonnet Receives Mixed Reviews from Developers

Key points:

GPT-4.5 Launch Underwhelms as Economics of AI Scaling Questioned

Key points:

DeepSeek Releases AI Infra Effort, Races for R2 Model

Microsoft Unveils Phi-4 Small Language Models for Multimodal AI

??Genloop Updates: Introducing the LLM Research Hub

领英推荐

?? Featured Blog Posts

AI is rewriting our world - literally

GPT-4.5 is a nothing burger!

Key points:

?? Research Corner

SWE-RL: Applying RL-Based Reasoning to Software Engineering

Key highlights:

NSA: Efficient Long-Context Modeling from DeepSeek AI

Key highlights:

Looking Forward

About Genloop

Fine-Tuned by Genloop

481 位关注者

Ayush Gupta的更多文章

Google Shrinks AI: Gemma 3 Packs Llama’s Power in 1/3rd the Size

xAI Unveils Grok 3, Fine-Tuned LLMs Dominate Text-to-SQL

DeepSeek’s Impact Reshapes AI, Markets, and Global Power

OpenAI’s Stargate Bet while DeepSeek R1 Closes the Gap

US AI Export Restrictions Will Push the Rise of Domain Memory Agents

The Evolution of LLM Fine-Tuning and Customization in 2024

Should You Fine-Tune Your LLM? A Data-Driven Framework and Evaluation Tool

社区洞察

其他会员也浏览了

Open Letter To Pause AI Experiments Makes ‘ZERO’ Sense

The AI Horizon: GPT4o, Gemini, Mustafa Suleyman and Meredith Whittaker

2023: The best AI papers - A Review ??

??Boom! Is GPT-4.5 Here?

The AI Pulse: A Sci-Fi Chronicle of AI Evolution (22nd - 28th February 2025 Edition)

DeepSeek: The AI Challenger That’s Shaking Up the Game

Introduction to AI Processing on Quadra

Artificial intelligence: A new dawn for ARC or history repeating itself?

AI through 2030

NewMind AI Journal #29