FOD#71: Matryoshka against Transformers

FOD#71: Matryoshka against Transformers

we explore the new Matryoshka State Space Model, its advantages over Transformers, and offer a carefully curated list of recent news and papers

This Week in Turing Post:

  • Wednesday, AI 101: Everything about Whisper Model
  • Friday, Agentic Workflows series: Use cases

If you like Turing Post, consider:

1) becoming a paid subscriber (LAST DAY! only $21 for the WHOLE YEAR )

2) registering for free to GenAI Productionize 2.0 event

3) sharing it with a friend

4) doing all the above ;)

It helps us share more educational articles for free ????

The main topic

If in my childhood someone had told me I would set Matryoshka against Transformer, I would have been puzzled. After all, one is a symbol of traditional Russian craftsmanship – stacking dolls within dolls, each revealing something hidden beneath. The other? A futuristic robot capable of morphing into various forms, epitomizing adaptability. Yet here we are, years later, first using 'Matryoshka' to describe layered, nested representation learning within 'Transformer' architectures. And then – using Matryoshka in a rival architecture!

The first merging of concepts happened in 2023, when researchers from Google Research presented MatFormer . In it, each Transformer block was designed with nested sub-blocks, where smaller submodels (like layers in a Matryoshka doll) are contained within larger ones. This enables the model to dynamically extract submodels of varying sizes from a single universal model without the need for separate training, allowing for flexible scaling and elastic inference across tasks and modalities. This is called Matryoshka Representation Learning.

This approach allows scaling the model down by using only specific parts, while still retaining the necessary knowledge and performance. These smaller submodels work efficiently without requiring additional training, as they share the same underlying space as the larger model.

Recently, however, Transformers are facing increasing critiques. AI21 CEO Ori Goshen challenges the supremacy of Transformers. He argues that agents relying on these models struggle with efficiency and cost. He – understandably – advocates for AI21's JAMBA architecture, based on Mamba, claiming it promises faster, more reliable AI agents with better memory performance.

Well, Mamba, as we’ve explained before, is indeed a legitimate candidate to rival Transformers. But what if we combine it with the good old Matryoshka to deal an even bigger blow to Transformers?

Researchers from Scaled Foundations and the University of Washington did exactly that. MatMamba integrates Matryoshka Representation Learning into Mamba2's State Space Model (SSM) , creating a flexible, nested architecture across its parameters. This design allows for the extraction of multiple smaller models from a single, large model without retraining. Each submodel retains critical learned representations, ensuring consistent performance across varying sizes.

Compared to MatFormer and Transformers, MatMamba offers faster inference – especially for long sequences – due to its SSM backbone and more granular, adaptive scaling across compute requirements.

For example, on edge devices with limited resources, MatMamba can dynamically extract smaller models without retraining, allowing inference to adjust to available memory or compute power – something Transformers struggle with due to their rigid architecture.

In cloud inference scenarios, where compute resources fluctuate, MatMamba’s ability to flexibly switch between submodels allows for efficient, real-time scaling. While Transformers dominate general-purpose tasks, MatMamba could surpass them in domains where long context and elastic deployment are needed, such as real-time video analysis or large-scale image retrieval.

To be realistic, MatMamba is unlikely to entirely replace Transformers in every context, as both excel at different tasks. Instead, it may carve out a niche in applications demanding both high efficiency and adaptive, scalable inference.

As multi-agent ecosystems emerge, we will see more attempts to create alternatives to Transformers that may steal the spotlight.


?? We recommend - Expert insights at GenAI Productionize 2.0

Don’t miss?GenAI Productionize 2.0 ?– the premier conference for GenAI application development, featuring AI experts from leading brands, startups, and research labs!


Learn actionable insights, strategies, and techniques for?generative AI stack design, governance, evaluation, and observability.

But don’t take our word for it; here are real quotes from previous attendees:

  • "I'm blown away by the high quality and value of this event." - Ricardo B.
  • "Great event - worth getting up at 4am in the morning for!" - Sandy A.
  • "Spectacular and very insightful summit! Very well done!" - Chad B.

FREE REGISTRATION


Twitter library


News from The Usual Suspects ?

News from The Usual Suspects ?

Adobe Unleashes Generative Fireworks at MAX?

  • Adobe drops major updates at its MAX conference, expanding its Firefly AI with the first video model safe for commercial use. New AI tools in Premiere Pro help smooth transitions and extend clips, while over 100 new Creative Cloud features land across flagship apps. Also in the mix: collaborative creativity via Project Concept and the GenStudio platform for marketing pros. Oh, and Gatorade bottles—now personalized with Firefly.??

Two Nobel Prizes (in Chemistry and Physics) were awarded for achievements rooted in Deep Learning! We explained what for in our ML flashcards.

OpenAI’s Swarm of AI Workers

  • OpenAI's latest cookbook introduces "routines" and "handoffs" to orchestrate AI agents more efficiently, making the leap from flashy demos to robust multi-agent workflows. With tools like Swarm, AI agents can now smoothly pass conversations to each other, handling tasks such as refunds, sales, and support, all while minimizing bottlenecks in the process. Enterprise AI just got smarter.

TSMC: AI's Chip Champion

  • TSMC's third-quarter profits are set to soar 40% , fueled by surging AI chip demand from tech giants like Apple and Nvidia. As the world’s leading contract chipmaker, TSMC is expanding globally, spending $65 billion on U.S. factories, but keeping most production in Taiwan. With shares up 77% this year, TSMC is riding high on the AI boom.

Anthropic in its Loving Grace

  • Dario Amodei’s 15,000 words investor pitch that introduces a new term ‘Powerful AI’ instead of AGI %/
  • More practical: Anthropic rolls out the Message Batches API, cutting costs by 50% for developers dealing with massive datasets. Now, you can batch up to 10,000 queries with Claude 3.5 Sonnet, Opus, and Haiku, processed within 24 hours. Perfect for non-time-sensitive work, this API offers scalable data analysis minus infrastructure headaches. Quora’s already onboard, loving the smooth ride.??

Gradio 5: Web Apps on Rocket Fuel

  • Hugging Face launches Gradio 5, amping up ML web apps with sleek design, server-side rendering for lightning-fast loads, and real-time streaming. Low-latency, production-ready apps with just a few lines of Python, plus, an AI playground that lets you create apps right in your browser.

Writer’s Palmyra X 004 Takes Action

  • Writer introduces Palmyra X 004, a powerhouse AI model built to handle enterprise tasks with finesse. Now with tool-calling capabilities, it automates workflows across apps, pulling data, running code, and even sending emails. This LLM also leads the pack in performance benchmarks, showing up OpenAI and Anthropic.
  • Wondering what Inflection AI has been up to?

Inflection AI, in collaboration with Intel Gaudi? 3, launches Inflection for Enterprise, powered by the high-performing Inflection 3.0 model. Designed for businesses that need more than a chatbot, it offers full control over data, models, and architecture – on-prem, cloud, or hybrid.


We are reading


The freshest research papers were published. We categorized them for your convenience ????


要查看或添加评论,请登录

社区洞察

其他会员也浏览了