Qwen and DeepSeek Lead AI Wave as OpenAI and Anthropic Falter

Qwen and DeepSeek Lead AI Wave as OpenAI and Anthropic Falter

Fine-Tuned by Genloop - #5

Dear Readers,

Welcome to Edition 5 of Fine-Tuned by Genloop – your go-to guide for the latest in LLM customization. This edition is packed with exciting model releases that are reshaping the AI landscape. We dive into Qwen's impressive QwQ-32B reasoning model that is 20x smaller than DeepSeek R1 with comparable results, DeepSeek's infrastructure innovations and upcoming R2 model, Microsoft's edge-friendly Phi-4 series, Anthropic's contentious Claude 3.7, and OpenAI's underwhelming GPT-4.5 launch.

Notably absent from the recent flurry of releases? Meta's Llama models which remain the most widely used foundation for enterprise domain intelligence. While they're making incremental improvements with reinforcement learning (SWE-RL paper, covered in Research Section), we're all waiting for Meta to make their next big move!

We're thrilled to announce the launch of our LLM Research Hub, where you can discover curated research papers ranked by popularity, relevance, and quality from multiple sources. We've also kicked off a weekly LLM Research Reading Group to collectively explore groundbreaking research – fill up the form if you're interested! (more about it in Genloop updates below)

Let's dive in!


?? AI Industry Highlights


Qwen Releases QwQ-32B Reasoning Model to Rival Industry Leaders

Qwen has launched QwQ-32B, a medium-sized reasoning model that achieves competitive performance against larger models like DeepSeek-R1 671B and outperforms OpenAI's o1-mini on several benchmarks despite having significantly fewer parameters.

Key points:

  • Efficient Architecture: Dense 32.5B parameter model (31.0B non-embedding) with 131k token context length, 20x smaller than DeepSeek R1!
  • Dual Training: Combines both Supervised Fine-Tuning (SFT) and Reinforcement Learning
  • Open Access: Available via Hugging Face and QwenChat with comprehensive documentation

This is just the medium model, their max model is still in development. Open-source is really pulling ahead in this race!


Source:

Satya Nadella Grounds AI Hype with Economic Reality Check

In a surprising development, Microsoft CEO Satya Nadella has taken a pragmatic stance on AI's potential impact, focusing on economic benchmarks rather than superintelligence hype. This is quite contrary to OpenAI’s narrative. He suggests that the true measure of AI's success should be its contribution to GDP growth.

Key points:

  • Economic Benchmark: For AI to match the Industrial Revolution's impact, we'd need to see GDP growth jump from the current 2% to 7-10% in developed economies. That should translate to an additional $10 trillion in economic value annually
  • No Single Winner: Nadella also?explained why?he believes that no single company will dominate AI through one superior model, as open-source competition will prevent a "winner-take-all" scenario

Nadella directly challenges OpenAI's "universe value capture" investment thesis.


Claude 3.7 Sonnet Receives Mixed Reviews from Developers

Anthropic has released Claude 3.7 Sonnet, their most advanced model to date and the first "hybrid reasoning model" with extended thinking capabilities. While it is impressive on paper, the reception among developers has been surprisingly mixed, particularly regarding its coding abilities.

Key points:

  • Extended Thinking Mode: Claude 3.7 can show its reasoning process before answering, with user control over thinking depth
  • Claude Code Tool: A new command-line tool allowing AI to search codebases, edit files, run tests, and push to GitHub
  • User Complaints: Power users report issues with over-designing (400-line scripts ballooning to 1,100 lines), selective hearing of instructions, personality downgrade, and poor handling of complex projects
  • Adaptation Required: Successful users report better results by front-loading detailed instructions rather than conversational back-and-forth, being extremely specific about constraints, and starting fresh projects instead of continuing existing work

While this trend suggests advanced models may be trading collaborative flexibility for autonomous efficiency, having a leading closed-source model disappoint users raises concerns about proprietary solutions. This serves as a reminder to those who believe general-purpose LLMs alone can solve Enterprise AI—domain intelligence remains essential for extracting real value from Generative AI.

Source:


GPT-4.5 Launch Underwhelms as Economics of AI Scaling Questioned

OpenAI released GPT-4.5, positioning it for "emotional intelligence" rather than reasoning power. The reception has been decidedly mixed, more leaning on the negative side.

Key points:

  • Marginal Improvements: Former OpenAI researcher Andrej Karpathy noted it required 10X more compute for "diffuse" improvements
  • Prohibitive Pricing: At $75/input and $150/output per million tokens, GPT-4.5 costs 10-25X more than competitors
  • Economic Reality Check: With an estimated $500M training cost and plans to spend significantly more in 2025, the economics are increasingly questionable
  • Scaling Concerns: AI researcher Gary Marcus called it evidence that "scaling data and compute is not a physical law."

GPT-4.5 encapsulates the AI industry's current dilemma: delivering technological achievements that can't justify their astronomical costs. As one observer summarized: "Half the TL saying it's bad and too expensive. Half the TL saying it's good and too expensive." With Sam Altman declaring they're "out of GPUs," this release indicates the limits of scaling laws and price drops.


Source:

DeepSeek Releases AI Infra Effort, Races for R2 Model

DeepSeek has open-sourced some of their impressive infrastructure optimizations like the 3FS file system that gives 6.6 TiB/s throughput in a 180-node storage cluster. Reports also suggest it is rapidly approaching the R2 model.

Microsoft Unveils Phi-4 Small Language Models for Multimodal AI

Microsoft released Phi-4-multimodal (5.6B size) and Phi-4-mini (3.8B size), compact language models designed for edge deployments. The 5.6B parameter model can handle speech, vision, and text simultaneously, while the 3.8B parameter mini-model focuses on text-based tasks like reasoning, math, and coding. Read the release blog here.


??Genloop Updates: Introducing the LLM Research Hub

We're excited to announce the launch of our LLM Research Hub – a powerful tool designed to keep you at the cutting edge of language model research without the overwhelming information overload!

Keeping up with the latest LLM research has become increasingly challenging, with new papers published daily across multiple platforms. To solve this problem, we built an internal agentic workflow that automatically:

  • Gathers research papers from multiple authoritative sources
  • Ranks them based on relevance, quality, and potential impact
  • Curates the most significant findings for easy consumption

What began as an internal tool has now been opened to everyone!


Source:

We've also started a weekly paper reading group where we collectively dive deep into the most impactful research. If you'd like to join these sessions and stay at the forefront of LLM advancements, sign up using this form.


?? Featured Blog Posts

We've got two fascinating reads that showcase how the AI landscape is evolving:

AI is rewriting our world - literally

Research reveals that AI is transforming written communication at an unprecedented pace. By late 2024:

  • 18% of financial complaints show AI assistance
  • 24% of corporate press releases involve AI writing
  • 15% of small company job postings demonstrate AI influence

After ChatGPT's release, a brief adoption lag was followed by explosive integration across communication domains. Detection remains challenging, with sophisticated AI outputs often escaping current screening methods.

The implications are clear: AI is quietly but fundamentally reshaping how we write and communicate professionally.

Read more

GPT-4.5 is a nothing burger!

Despite significant anticipation, OpenAI's GPT-4.5 release has been described as underwhelming by many in the AI community. As Gary Marcus and others have pointed out, this release reveals several important trends in the current AI landscape.

Key points:

  • Diminishing Returns from Scaling: Traditional model scaling approaches are showing reduced effectiveness in improving performance
  • Competitive Landscape Shift: OpenAI is rapidly losing its dominant edge
  • Price-Performance Concerns: GPT-4.5 is 15x pricier than GPT-4o and 25x more expensive than Claude 3.7 Sonnet while offering similar or worse performance
  • Quality Inconsistency: Not every OpenAI release is delivering breakthrough capabilities

Read Gary Marcus's analysis


?? Research Corner

Our team has been diving deep into groundbreaking research papers, and two particularly caught our attention:

SWE-RL: Applying RL-Based Reasoning to Software Engineering

Meta AI's SWE-RL paper applies a GPRO-like approach (similar to DeepSeek R1's methodology) to fine-tune Llama 3 using open-source software evolution data. The result is a model that develops autonomous reasoning processes similar to those of experienced developers.

Key highlights:

  • Rule-Based Rewards: The model trains using reinforcement learning with rewards based on how well AI-generated fixes match verified software patches
  • Competitive Performance: Llama3-SWE-RL-70B solves 41.0% of issues on SWE-bench, matching much larger proprietary models like GPT-4o and outperforming other <100B models
  • Generalized Reasoning Benefits: Although focused on software tasks, the model unexpectedly improved at general reasoning, outperforming supervised fine-tuned baselines on five unrelated tasks

Read our TuesdayPaperThoughts analysis


NSA: Efficient Long-Context Modeling from DeepSeek AI

DeepSeek AI's NSA (Natively Sparse Attention) paper introduces a novel sparse attention mechanism designed for efficient processing of long sequences. This approach addresses one of the most significant challenges in scaling next-generation LLMs by reducing the computational burden of attention mechanisms.

Key highlights:

  • Hierarchical Sparse Strategy: NSA dynamically compresses and selects tokens, combining coarse-grained compression with fine-grained retention to maintain global context while reducing computation
  • Impressive Performance: The model matches or exceeds Full Attention baselines while achieving 11.6x, 9x, and 6x speedups in decoding, forward, and backward propagation, respectively
  • Hardware-Optimized Design: Unlike other sparse methods, NSA is designed for modern accelerators like A100 GPUs and can be trained end-to-end, reducing pretraining costs

With models now routinely handling 64k+ tokens, approaches like NSA become critical for making long-context reasoning both practical and efficient.

Read our TuesdayPaperThoughts analysis


Looking Forward

Open source is advancing rapidly, democratizing access to AI intelligence. While recent weeks have seen more breakthroughs coming from China than the US, this dynamic may shift soon—there are high expectations for Meta. DeepSeek's success has certainly given them a wake-up call.

This surge in competition has been transformative for open-source LLM development. We now have multiple specialized models—both reasoning and non-reasoning—that can outperform general-purpose LLMs. In fact, our own experiments with customized reasoning models are showing remarkable results, with performance improvements of 200% compared to leading general-purpose models like GPT4o. We'll be sharing these experimental findings in the coming weeks.

If you'd like to join our exclusive LLM Research Reading group, please sign up here. All papers that we'll be discussing will be available on our research hub.


About Genloop

Genloop delivers customized LLMs that provide unmatched cost, control, simplicity, and performance for production enterprise applications. Please visit genloop.ai, catch us on Linkedin, or email [email protected] for more details.


Stay curious,

The Genloop Team

Shekhar Patil

Head of Partnerships and Alliances @ Last9

3 周

I am not even keeping up with your articles. How do you keep up with so many AI updates ??

Ayush Gupta

Customizing LLMs with ? Genloop | ? Apple, ??Stanford University

3 周

Miten N Mehta like we were discussing.

回复
Ayush Gupta

Customizing LLMs with ? Genloop | ? Apple, ??Stanford University

3 周

Form for joining paper reading group: https://forms.gle/pisk1ss1wdzxkPhi9 LLM Research Hub: https://genloop.ai/research-hub

回复

要查看或添加评论,请登录

Ayush Gupta的更多文章

社区洞察

其他会员也浏览了