登录查看更多内容

Mamba: The Next Evolution of GenAI - Will 2024 be the beginning of the end of Transformer-Based Models?

Shaun Tyler

Director Global Software Integration & AI Thought Leader at Koerber Pharma Software

发布日期: 2023年12月27日

Introduction

In tech terms, 2023 was the year of the transformer, not just GenAI, not just AI, but specifically the year of the transformer, which forms the basis for almost all foundational models like GPT-4. So, will 2024 just continue and improve upon this? Will it be the year of the transformer 2.0? I don't think so. Transformer models don't scale well, which might not be a major issue if you're asking Bing Copilot which restaurant to consider, but it's a huge issue for industries requiring large context windows like the pharmaceutical industry.

While transformers have kicked off the GenAI revolution and are great in many ways, they also come with severe limitations. Their greatest advantage, the attention mechanism, is also a disadvantage because it scales quadratically in complexity. What does this mean? Perhaps you've noticed when chatting for a long time in the same chat window with ChatGPT, it starts to forget the beginning of your conversation, leading to poorer responses over time. The reason is that transformers need to build a relationship between every word to contextualize your prompt. At the same time, your chat history acts as the short-term memory. This means every time you send a new prompt, the history is also sent as part of the prompt to give the model context. As you can imagine, this doesn't scale well; over time, and with the size of foundational models like GPT-4, it requires a lot of computing power.

Even though having 300 pages worth of prompt size seems huge, it doesn't mean that every new prompt can be 300 pages long repeatedly. It means that your entire chat history can be, for example, 128k tokens long for GPT-4 Turbo. If you start with a long paper or thesis, you'll quickly reach the limit within about 10 minutes. In my line of work, MES for the pharmaceutical industry, working with large recipes will quickly overwhelm current foundational models, especially if you want the model to continuously understand what you're talking about, perhaps even copiloting you through complex modeling processes.

Is that it, then? Should we just change foundational models to process more tokens simultaneously, increasing the need for computational power exponentially? Probably not.

I recently read a paper, [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces (arxiv.org), that will most likely introduce the next wave of GenAI foundational models and is most probably as significant as [1706.03762] Attention Is All You Need (arxiv.org) that was the basis for the GenAI revolution.

Our article today will guide you through everything you need to know to understand the basics of this breakthrough. First, we will bring you up to speed on Transformers and their limitations, then we will discuss Structured State Space Models – their advantages and limitations – and how a new kind of model named Mamba has solved these disadvantages while addressing the issues explained above with our beloved transformers.

Have fun with my newest article, closing the year 2023 and I wish you all a good start into 2024.

Section 1: The Transformer Model Explained

Transformer models are at the forefront of AI for processing sequential data, such as text or speech. Central to their effectiveness is the attention mechanism, which allows these models to focus on different parts of a sequence to better understand context. This mechanism is particularly vital in tasks that require an understanding of the relationships between various elements within a sequence.

For example, consider the word "rainbow." In a mythological context, a Transformer model might associate it with a pot of gold, while in a scientific context, it could lead to an explanation of light phenomena. This contextual adaptability is a testament to the model's attention mechanism, which dynamically adjusts focus and interpretation based on surrounding content.

The architecture of Transformers comprises two main components: the encoder and the decoder. The encoder is responsible for processing the input data, understanding each element within its context. The decoder then uses this processed information to generate the output. This structure is highly effective in applications like language translation, where the model translates a sentence from one language to another while maintaining the contextual integrity of the original content.

Despite these strengths, Transformers face a significant limitation in their efficiency with long sequences. The attention mechanism, while powerful, requires the model to evaluate and process the relationships between each element in the sequence and every other element. Consequently, as the length of the input increases, so does the computational workload, often exponentially. This inefficiency becomes particularly pronounced in tasks that involve lengthy documents or complex datasets, where the model's performance can be hindered by the extensive computational demands.

In summary, Transformer models are formidable AI tools for sequence understanding, but their efficiency diminishes with longer sequences. This challenge has catalyzed the exploration and development of new models like Structured State Space Models (SSMs) and Mamba, which seek to address the inefficiencies of Transformers, particularly in handling extended data sequences.

Section 2: Structured State Space Models (SSMs) – Understanding the basics

Structured State Space Models (SSMs) represent a significant shift in the field of sequence modeling, offering a unique alternative to Transformer models. SSMs uniquely combine elements from recurrent neural networks (RNNs) and convolutional neural networks (CNNs), which makes them particularly efficient for processing certain types of data, especially continuous signals like audio or visual inputs.

Think of SSMs as a sophisticated system for tracking and predicting changes over time in a data sequence. They're like a project manager who constantly updates their understanding of a project's progress based on the latest reports and developments. SSMs continuously update their 'state' or understanding of a sequence with each new piece of data they process, much like how a manager would integrate new information into the project's trajectory.

领英推荐

Artificial intelligence, here to help?

雷诺 1 年前

#StridingTowardsTheIntelligentWorld-No AI Without…

Huawei IT Products & Solutions 1 年前

Quo Vadis Artificial Intelligence?

Synerise 2 年前

This ability to merge the best aspects of RNNs (good at recognizing patterns over time) and CNNs (efficient at processing structured data) allows SSMs to efficiently handle long-range dependencies in data. This is particularly valuable when dealing with long sequences of information, where traditional models like Transformers might struggle due to the extensive computational workload.

However, SSMs aren't without limitations. While they excel at handling continuous data like sound waves or video frames, they are less effective with discrete, complex data such as text. This is akin to a manager who is great at tracking ongoing, smooth processes but finds it challenging to grasp the nuances of varied, brief updates.

Despite these challenges, SSMs are a critical development in sequence modeling, particularly for continuous data types. Their design allows for efficient and effective handling of long data sequences, making them a valuable alternative to Transformer models in scenarios involving extensive sequences. This efficiency and versatility set the stage for the emergence of advanced models like Mamba, which build upon the capabilities of SSMs to address their limitations, especially in processing complex, discrete data types.

Section 3: Introducing Mamba – An Evolution of SSMs

Mamba stands as a significant advancement in sequence modeling, building upon the foundation of Structured State Space Models (SSMs). Its key innovation lies in the use of selective SSMs, a feature that enables focused and efficient processing of sequences, particularly beneficial for complex data types like text.

Selective SSMs for Targeted Focus: Imagine a project manager who not only tracks every aspect of a project but also knows exactly which parts need more attention at any given time. Mamba functions similarly. By employing selective SSMs, Mamba can dynamically adjust its focus based on the input data it receives. For example, when processing the word "rainbow," Mamba's parameters would adapt depending on whether the context is mythological or scientific. In a mythological context, it might highlight elements linked to legends or folklore; in a scientific context, it would prioritize data related to weather phenomena.

Filtering Out Irrelevant Information: This ability to selectively focus allows Mamba to effectively sift through a large sequence of data, identifying and prioritizing parts that are most relevant to the current context. It's like having a filter that separates crucial information from background noise, ensuring that the model's attention is directed where it matters most.

Efficiency with Long Sequences: One of Mamba's standout strengths is its handling of long sequences. Unlike traditional Transformer models that process every part of a sequence in relation to every other part, Mamba's approach scales linearly with sequence length. This means that as the sequence grows, Mamba maintains its efficiency, avoiding the computational overload that plagues other models. It’s akin to a manager who can handle increasingly complex projects without getting overwhelmed.

Optimization for Modern Hardware: Complementing its selective SSMs, Mamba incorporates a hardware-aware algorithm optimized for modern GPU architectures. This ensures that Mamba not only makes smart decisions about what data to focus on but also processes this data in the most efficient manner possible on current hardware.

In summary, Mamba represents a leap forward in sequence modeling by combining the ability to selectively focus on relevant data (like distinguishing different contexts of "rainbow") with efficient processing capabilities. Its selective SSMs, coupled with a hardware-optimized algorithm, make it an exceptionally powerful tool in AI, opening new possibilities for handling large-scale and complex data with unprecedented efficiency.

Section 4: Conclusion; Mamba's Breakthrough Potential

Mamba's introduction in AI sequence modeling represents a transformative development, particularly in its potential to influence the future of General AI (GenAI) foundation models. This impact goes beyond mere technical advancements, indicating a shift towards more efficient and specialized AI systems.

Redefining Efficiency in AI Modeling: Mamba's linear scaling with sequence length and selective focus mechanism present a blueprint for developing foundation models that are smaller yet highly effective. This efficiency could revolutionize how we approach the construction of GenAI models, moving away from the trend of increasing size and computational demands.

Specialization and Versatility: The potential of Mamba to facilitate specialized, task-focused foundation models could lead to a new era in AI where models are not just powerful but also tailored for specific applications. This approach ensures high-quality performance without the extensive resource requirements typical of larger models, maintaining versatility across various tasks.

Empirical Validation and Broader Implications: Mamba has already demonstrated superior performance in diverse domains, indicating its capability to outperform existing models in efficiency and effectiveness. This empirical validation underscores Mamba's potential role in shaping GenAI models that are more accessible, sustainable, and adaptable to a range of applications.

A Sustainable Approach to AI Development: With its hardware-aware design, Mamba also points towards a more sustainable path in AI development, addressing the environmental concerns associated with large-scale model training and deployment. This aspect is crucial in making advanced AI technologies more feasible and responsible.

In conclusion, Mamba's breakthrough is not confined to its architectural innovations but extends to its potential in redefining the landscape of GenAI foundation models. Its ability to efficiently process long sequences, coupled with its specialization and adaptability, positions Mamba as a pivotal model that could lead to a new generation of smaller, more effective, and sustainable AI systems. This shift challenges the current paradigm of Transformer-based models and paves the way for a more diverse and practical approach in AI modeling.

带有此图标的链接由领英创建，不带此图标的链接由作者添加。

Adrian Wilson

Striking Photography, Made Simple | Corporate, Construction & Drone Photography

1 年

Even in my simple consumer use of AI, I have noticed GPT 4 "forgetting things" you've told it earlier in the chat - gets a bit frustrating to say the least. Sounds like a step in the right direction :-)

1 次回应

Udo Kiel

????Vom Arbeitswissenschaftler zum Wissenschaftskommunikator: Gemeinsam für eine sichtbarere Forschungswelt

1 年

Sounds like an exciting breakthrough! Can't wait to dive into your article. ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Shaun Tyler的更多文章

Retrieval-Augmented Generation – Explaining the Next Core Pattern for Building Successful AI Products in the Pharmaceutical Industry

2024年8月6日

Retrieval-Augmented Generation – Explaining the Next Core Pattern for Building Successful AI Products in the Pharmaceutical Industry

Introduction Hey Everyone, I'm back from vacation, after two weeks of digital detox. No mobile reception, no social…
How to Design a Modern AI Agent-Based Architecture with Object-Oriented Software Patterns

2024年6月11日

How to Design a Modern AI Agent-Based Architecture with Object-Oriented Software Patterns

For the past weeks I explained the core patterns when designing a GenAI product. I changed the topic for today's…

1 条评论
Defensive UX as a core pattern for GenAI Applications

2024年4月14日

Defensive UX as a core pattern for GenAI Applications

Why Defensive UX is Crucial in GenAI products When introducing GenAI products into the pharmaceutical industry, it's…

5 条评论
Guardrails in LLM Applications for the Pharmaceutical Industry

2024年3月3日

Guardrails in LLM Applications for the Pharmaceutical Industry

One of the core patterns (https://www.linkedin.

5 条评论
GitHub Copilot: A Copilot not the Pilot! The necessity of design patterns in modern software architecture

2024年2月10日

GitHub Copilot: A Copilot not the Pilot! The necessity of design patterns in modern software architecture

Introduction A GitClear study, analyzing 153 million lines of code, unveils insights into how GitHub Copilot can…

1 条评论
The EU AI Act's Impact on Pharmaceutical Software: Strategies for Compliance and Innovation

2023年12月10日

The EU AI Act's Impact on Pharmaceutical Software: Strategies for Compliance and Innovation

Introduction As DALL-E still can't spell correctly as can be seen in the picture used for this article, AI and…

1 条评论
Q* and its impact on the pharmaceutical industry

2023年11月26日

Q* and its impact on the pharmaceutical industry

Introduction Over the last couple of days buzzwords like Q*, AGI, Super AGI etc. dominated AI-news and kicked off the…

2 条评论
AI Agents on the Frontier: Bridging Pharmaceutical Innovations with Autonomous Intelligence

2023年10月31日

AI Agents on the Frontier: Bridging Pharmaceutical Innovations with Autonomous Intelligence

Introduction: Generative AI agents are at the forefront of technological AI innovation, autonomously navigating tasks…
"The Future is Here, but It's Not Evenly Distributed": Navigating the Transition to Phase 2 of Generative AI in Pharma

2023年10月18日

"The Future is Here, but It's Not Evenly Distributed": Navigating the Transition to Phase 2 of Generative AI in Pharma

William Gibson's phrase, "The future is here, but it's not evenly distributed," resonates as we witness the trajectory…

1 条评论
Navigating Copyright Waters: GitHub Copilot and Pharmaceutical Software Development

2023年10月6日

Navigating Copyright Waters: GitHub Copilot and Pharmaceutical Software Development

Introduction Dear Colleagues, As we continue our expedition into the confluence of artificial intelligence and the…

See all articles

Mamba: The Next Evolution of GenAI - Will 2024 be the beginning of the end of Transformer-Based Models?

Shaun Tyler

Director Global Software Integration & AI Thought Leader at Koerber Pharma Software

Introduction

Section 1: The Transformer Model Explained

Section 2: Structured State Space Models (SSMs) – Understanding the basics

领英推荐

Section 3: Introducing Mamba – An Evolution of SSMs

Section 4: Conclusion; Mamba's Breakthrough Potential

Shaun Tyler的更多文章

社区洞察

其他会员也浏览了

Seeing Clearer: Addressing 5 Common Misconceptions About AI Computer Vision

"A Century of Singularity: Echoes from AI's Solitude." A DALL-E Rabbit Hole.

2022-11-09 | Your Daily AI Research tl;dr ??

#86 - Artificial Intelligence and Machine Learning

Your Daily AI Research tl;dr - 2022-10-26 ??

Long Term Memory : The Foundation of AI Self-Evolution

Framing the right problems for AI to solve

From LLMs to Omniscient AI Machines: Navigating World's Complexity and Predicting the Unpredictable

The Unseen Bottleneck: How Power Constraints Are Shaping AI in Business

Epic Reasoning: How Far Can AI Reason? Try This Advanced Prompt

Introduction

Section 1: The Transformer Model Explained

Section 2: Structured State Space Models (SSMs) – Understanding the basics

领英推荐

Section 3: Introducing Mamba – An Evolution of SSMs

Section 4: Conclusion; Mamba's Breakthrough Potential

Shaun Tyler的更多文章

Retrieval-Augmented Generation – Explaining the Next Core Pattern for Building Successful AI Products in the Pharmaceutical Industry

How to Design a Modern AI Agent-Based Architecture with Object-Oriented Software Patterns

Defensive UX as a core pattern for GenAI Applications

Guardrails in LLM Applications for the Pharmaceutical Industry

GitHub Copilot: A Copilot not the Pilot! The necessity of design patterns in modern software architecture

The EU AI Act's Impact on Pharmaceutical Software: Strategies for Compliance and Innovation

Q* and its impact on the pharmaceutical industry

AI Agents on the Frontier: Bridging Pharmaceutical Innovations with Autonomous Intelligence

"The Future is Here, but It's Not Evenly Distributed": Navigating the Transition to Phase 2 of Generative AI in Pharma

Navigating Copyright Waters: GitHub Copilot and Pharmaceutical Software Development

社区洞察

其他会员也浏览了

Seeing Clearer: Addressing 5 Common Misconceptions About AI Computer Vision

"A Century of Singularity: Echoes from AI's Solitude." A DALL-E Rabbit Hole.

2022-11-09 | Your Daily AI Research tl;dr ??

#86 - Artificial Intelligence and Machine Learning

Your Daily AI Research tl;dr - 2022-10-26 ??

Long Term Memory : The Foundation of AI Self-Evolution

Framing the right problems for AI to solve

From LLMs to Omniscient AI Machines: Navigating World's Complexity and Predicting the Unpredictable

The Unseen Bottleneck: How Power Constraints Are Shaping AI in Business

Epic Reasoning: How Far Can AI Reason? Try This Advanced Prompt