Diverting Our Attention Once Again: A Look at Mamba
In my last post I wrote about Hyena, a model developed at Stanford that challenges the existing assumption in AI that “attention is all you need.” In the field of AI, “attention” refers to the ability of language models, such as those utilizing transformers, to compare each word of a sentence to every other word, allowing the model to develop a stronger grasp of the context behind an entire passage. However, while transformer and attention models have undeniably driven a new wave in GenAI technologies, the computational cost required to scale them remains a significant drawback.
Thus, many researchers have begun to explore whether attention is the only way to unlock the incredible capabilities of large language models (LLMs). This week, I dive into another proposed alternative: Mamba.
??? What is Mamba?
Mamba is an AI architecture announced at the end of last year by researchers between Carnegie Mellon University and Princeton University (go east coast!), designed as an alternative to transformers. Rather than utilizing attention to handle long data sequences, Mamba builds off a mechanism known as a State Space Model (SSM). In simple terms, an SSM is a “box” that holds onto key information over time. This box can be looked at through different views, depending on the data being processed and the desired outcome. For example, one perspective is based on Convolutional Neural Networks (CNNs), which are highly effective at filtering inputs and take little time to train, while another is based on Recurrent Neural Networks (RNNs), which are slow to train but generate outputs with relative ease. This combination enables SSMs to remain computationally efficient both during training and testing, with the ability to work on substantially longer data contexts.
In addition to the advancements of SSMs, Mamba introduces two key innovations:
With the incorporation of these characteristics, Mamba is classified as a selective structured state space sequence model, an alliteration (SSSSS) mimicking the sound of a snake, hence the name for the architecture.
领英推荐
?? What is the significance of MAMBA and what are its limitations?
Mamba has demonstrated exceptional performance across a wide range of domains, matching and even surpassing state-of-the-art transformer models. Instead of inputs being compressed into embeddings, which risk losing important context over time, Mamba has complete control over whether and how an input is remembered. This means it can theoretically retain important information across millions of datapoints, while keeping short-term details for only as long as they are needed.
Mamba is still in the testing stages, and there are plenty of questions it will need to address on how it compares to the massive closed-source models developed at OpenAI and Anthropic.
??? Applications of MAMBA
Mamba is exciting for many generative AI use cases, given the modeling requirement of extremely long sequences.
CEO UnOpen.Ai | exCEO Cognitive.Ai | Building Next-Generation AI Services | Available for Podcast Interviews | Partnering with Top-Tier Brands to Shape the Future
1 年Mamba sounds like a game-changer in the AI world!
Legal Knowledge Engineer | Fractional Chief Artificial Intelligence Officer (CAIO) | Founding Attorney - Code & Counsel, PLLC | Founding Member HeyCounsel | LegalTech Enthusiast
1 年Your article is the first I've come across MAMBA but conceptually it is a fascinating concept to me. In my last position, the model we use for due diligence and asset management was a highly developed FNN but we were starting to experiment with adding a layer related to document and image processing that would have combined a CNN with our FNN. I can visualize how a "Black Box" in that workflow that allows a model to selectively hold onto context from the different outputs of these networks over time and then allows them to communicate and cooperate in a latent space efficiently would be incredible useful. Thank you for sharing!