FOD#71: Matryoshka against Transformers
TuringPost
Newsletter about AI and ML. ?? Sign up for free to get your list of essential AI resources ??
we explore the new Matryoshka State Space Model, its advantages over Transformers, and offer a carefully curated list of recent news and papers
This Week in Turing Post:
If you like Turing Post, consider:
1) becoming a paid subscriber (LAST DAY! only $21 for the WHOLE YEAR )
2) registering for free to GenAI Productionize 2.0 event
3) sharing it with a friend
4) doing all the above ;)
It helps us share more educational articles for free ????
The main topic
If in my childhood someone had told me I would set Matryoshka against Transformer, I would have been puzzled. After all, one is a symbol of traditional Russian craftsmanship – stacking dolls within dolls, each revealing something hidden beneath. The other? A futuristic robot capable of morphing into various forms, epitomizing adaptability. Yet here we are, years later, first using 'Matryoshka' to describe layered, nested representation learning within 'Transformer' architectures. And then – using Matryoshka in a rival architecture!
The first merging of concepts happened in 2023, when researchers from Google Research presented MatFormer . In it, each Transformer block was designed with nested sub-blocks, where smaller submodels (like layers in a Matryoshka doll) are contained within larger ones. This enables the model to dynamically extract submodels of varying sizes from a single universal model without the need for separate training, allowing for flexible scaling and elastic inference across tasks and modalities. This is called Matryoshka Representation Learning.
This approach allows scaling the model down by using only specific parts, while still retaining the necessary knowledge and performance. These smaller submodels work efficiently without requiring additional training, as they share the same underlying space as the larger model.
Recently, however, Transformers are facing increasing critiques. AI21 CEO Ori Goshen challenges the supremacy of Transformers. He argues that agents relying on these models struggle with efficiency and cost. He – understandably – advocates for AI21's JAMBA architecture, based on Mamba, claiming it promises faster, more reliable AI agents with better memory performance.
Well, Mamba, as we’ve explained before, is indeed a legitimate candidate to rival Transformers. But what if we combine it with the good old Matryoshka to deal an even bigger blow to Transformers?
Researchers from Scaled Foundations and the University of Washington did exactly that. MatMamba integrates Matryoshka Representation Learning into Mamba2's State Space Model (SSM) , creating a flexible, nested architecture across its parameters. This design allows for the extraction of multiple smaller models from a single, large model without retraining. Each submodel retains critical learned representations, ensuring consistent performance across varying sizes.
Compared to MatFormer and Transformers, MatMamba offers faster inference – especially for long sequences – due to its SSM backbone and more granular, adaptive scaling across compute requirements.
For example, on edge devices with limited resources, MatMamba can dynamically extract smaller models without retraining, allowing inference to adjust to available memory or compute power – something Transformers struggle with due to their rigid architecture.
In cloud inference scenarios, where compute resources fluctuate, MatMamba’s ability to flexibly switch between submodels allows for efficient, real-time scaling. While Transformers dominate general-purpose tasks, MatMamba could surpass them in domains where long context and elastic deployment are needed, such as real-time video analysis or large-scale image retrieval.
To be realistic, MatMamba is unlikely to entirely replace Transformers in every context, as both excel at different tasks. Instead, it may carve out a niche in applications demanding both high efficiency and adaptive, scalable inference.
As multi-agent ecosystems emerge, we will see more attempts to create alternatives to Transformers that may steal the spotlight.
?? We recommend - Expert insights at GenAI Productionize 2.0
Don’t miss?GenAI Productionize 2.0 ?– the premier conference for GenAI application development, featuring AI experts from leading brands, startups, and research labs!
Learn actionable insights, strategies, and techniques for?generative AI stack design, governance, evaluation, and observability.
But don’t take our word for it; here are real quotes from previous attendees:
领英推荐
Twitter library
News from The Usual Suspects ?
News from The Usual Suspects ?
Adobe Unleashes Generative Fireworks at MAX?
Two Nobel Prizes (in Chemistry and Physics) were awarded for achievements rooted in Deep Learning! We explained what for in our ML flashcards.
OpenAI’s Swarm of AI Workers
TSMC: AI's Chip Champion
Anthropic in its Loving Grace
Gradio 5: Web Apps on Rocket Fuel
Writer’s Palmyra X 004 Takes Action
Inflection AI, in collaboration with Intel Gaudi? 3, launches Inflection for Enterprise, powered by the high-performing Inflection 3.0 model. Designed for businesses that need more than a chatbot, it offers full control over data, models, and architecture – on-prem, cloud, or hybrid.
We are reading
The freshest research papers were published. We categorized them for your convenience ????