登录查看更多内容

Learn First Multimodal LLM without Trouble and Perfect Medical LLM for Medicinal Research

Raghul Gopal

Data Science at Logitech | AWS Community Builder ??(ML & GenAI) | Talks and Writes about AI, AGI & Cloud Deployments of AI & AGI | Public Speaker ??| Blogger ??| Unlocking Data Secrets with Math & AI ????

发布日期: 2024年4月17日

Hello All,

This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI & AGI Research. Welcome to Learn with Me Newsletter Week 1 where I will be focusing on the advancements of Generative AI.

1.?????? Mirasol 3B (Multimodal Auto-Regressive Model for Time Aligned Video and Audio and Not-Time Aligned Contexts)

Right now, in the era of the Large Language Model, the main challenge is to combine multiple heterogeneous modalities say, audio, video, and text. We know that Video/Audio rates are higher than the text and they are roughly aligned in time. The audio and video are not synchronized with text which comes as a global context eg, Title/ Description.

Moreover, the volumes of video and audio inputs are larger when the length of the video increases, and for that, it needs more computing time for these modalities. To resolve these issues, the researchers introduced Mirasol 3B (which can be accessed from here: <>), which has two segments namely

Auto-Regressive Component for Time Synchronized Modalities (Audio/Video)
Auto-Regressive Component for Context Modalities which are not necessarily aligned with time but sequential (basically not time aligned)

Mirasol 3B Architecture: Left – Aligned Modalities (Video and Audio) Right – Non-Aligned Modalities (Text)

To address the long sequence of Video/Audio inputs, the partition of Audio/Video is done into consecutive snippers and autoregressively processes their representations. One of the finest things in Mirasol 3B is the Combiner Mechanism, which is used in autoregressive components for time-synchronized modalities where the information jointly produces compact but expressive representations. These features make the model take the 512 frames as input without an increase in the model parameters.

Let’s see how Video/Audio-based transformers work. Note that Architecture for Video Language understanding commonly uses a Joint Transformer (Video Input + Text Tokens are processed autoregressively). Now, let’s have a look at how Mirasol 3B uses both Time Aligned and Not Time Aligned in conjunction to provide extraordinary features.

Natively, in Video Representations, the basic form is the Spatio Data Representations. To extract, they used Sparse 3D tubes with Standard 2D Patches and they are processed using the ViT Encoder. Similarly, in this research, the audio has been represented as Spectrograms.

Auto-Regressive Modeling of Video and Audio in time

There are two roles of the Combiner module namely

Combine Video/Audio features at a specific snippet at the time (Joint Representations)
Effectively compress the representation from each audio/video snippet, allowing the model to scale to longer videos.

Combiners: Left – Standard Transformer Combiner Right – TTM (Token Turing Machine)

Since these Audio/Video Representations are Auto Regressive they have the condition to predict the next frame with the help of previous time intervals. So, the value xt?is passed sequentially to the autoregressive model.

Combining Both Auto-Regressive Models on Time Aligned and Not Time Aligned with the help of Latent Model Output h^?as cross attention to producing, where w is the tokenized text sequence with length L. Note that, their model has 3B Parameters, without audio it has 2.9B Parameters. See the experiments conducted on various benchmarks and they have conducted the experiments on Alblations.

Results of Mirasol 3B on MSRVTT-QA Benchmark. Furthermore, Mirasol 3B has been compared with other state-of-art models.

领英推荐

GAI Can Create Quite “Human” Doctor’s Notes

BigRio 1 年前

??Top ML Papers of the Week

DAIR.AI 1 年前

#2 The Transformative Impact of Generative AI on…

Future Medicine AI 5 个月前

Results of Mirasol 3B (Long Video QA) on Activity Net Benchmark. Furthermore, Mirasol 3B has been compared with other state-of-art models.

Results of Mirasol 3B (Long Video QA) on NExT-QA Benchmark. Furthermore, Mirasol 3B has been compared with other state-of-art models.

Audio-Video results on Kinetics-Sound, VGG-Sound, and Epic-Sound of Mirasol 3B

2.?????? MediTron 70B – Medical Pretraining for Large Language Models

Large Language Models (LLMs) can potentially democratize access to medical knowledge. Moreover, the LLM's access to Medical data is less in store, having closed sources like (GPT-4, or PaLM), or limited in Scale (?Parameters). MediTron 70B (access it from here <>), built an open source LLM (7B, and 70B) which is being built on Llama2, through the adaptation of Nvidia’s Megatron LM Distributed Trainer. MediTron has been trained on Medical Corpus such as PubMed Articles, Abstracts, and more. The model has been evaluated in four medical reasoning benchmarks using both in

·???????? In Context Learning – Prompt within Context Window

·???????? Text Specific Fine-tuning

Let’s see the engineering behind the MediTron 70B:

To harness the power of large language model parameter size, and pre-training token count, they found MEGATRON LLM DISTRIBUTED TRAINING LIBRARY, which is being extended from Nvidia’s Megatron LM to support three open-source LLMs namely Llama, Falcon, and Llama2. The Megatron LLM training Library supports Data Parallelism (DP), Pipeline Parallelism (PP), and Tensor Parallelism (TP).

Megatron LM natively supports GPT-like architecture, but the researchers of MediTron 70B extended its functionality to support Llama, Falcon, and Llama2. The MediTron has two different models namely MediTron 7B with the context length of 2048, and MediTron 70B with the context length of 4096. Then integrated necessary new architecture features such as rotatory position embedding, grouped-query attention, the parallel attention/MLP in the transformer layer of Falcon-40B, and the unbinding of the word embedding and the next token prediction classifier weights used in Llama. They also added support for Flash Attention, Flash Attention 2 for more efficient inference, and long context decoding.

For the model architecture of Llama-2, they inherited the standard transformer architecture, the use of RMS Norm, the SwiGLU activation function, and the rotatory positional embedding. They also used GQA (Group Query Attention) introduced by Llama-2 here.

Don’t worry about the new features, we will be exploring them soon in our newsletter ??

They followed the OpenAI’s ChatML format to format the instruction data. ChatML document consists of a series of messages, starting with the special token <|im_start|> followed by the role (user\Assistant).

Few Shot Learning Results of MediTron Model on Different Benchmarks namely MedQA, MedMCQA, and more. Further, it is compared with other state-of-art models.

MediTron Comparison with other source models. Further, the models are tested with other inferences namely Top Token Selection, Chain of Thoughts, and Self Consistency Chain of Thoughts respectively.

Comparison of MediTron 70B with other Commercial LLMs such as GPT 3.5, MedPaLM 540B, GPT-4, and MedPaLM-2-540B

That’s it for Week 2. Happy Day, Happy AI.

Follow me here to learn more about the releases of AI, and AGI with a clear understanding ??

Learn with Me

1,505 位关注者

Marcelo Grebois

? Infrastructure Engineer ? DevOps ? SRE ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level

10 个月

Great to see the latest newsletter. Multimodal research sounds fascinating. #innovative Raghul Gopal

1 次回应

Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

10 个月

Excited to dive into the newsletter. #learnwithme Raghul Gopal

1 次回应

Raghul Gopal

10 个月

Mirasol 3B: https://arxiv.org/abs/2311.05698 MediTron 70B: https://arxiv.org/abs/2311.16079

查看更多评论

要查看或添加评论，请登录

Raghul Gopal的更多文章

Attention as an RNN - Aaren ?? | Don't Memorize - Be like a Goldfish??to mitigate Memorization in LLMs ??

2024年6月20日

Attention as an RNN - Aaren ?? | Don't Memorize - Be like a Goldfish??to mitigate Memorization in LLMs ??

1. Attention as an RNN Transformers models marked a significant breakthrough in sequence modeling providing a highly…

1 条评论
Mixed Modal FM ??- Chances of Llama 4 | Aya 23 - Successor of Aya 101 ???

2024年6月20日

Mixed Modal FM ??- Chances of Llama 4 | Aya 23 - Successor of Aya 101 ???

1. Chameleon – Mixed – Modal Early Fusion Foundational Model As you might hear the news that the Chameleon is the next…
Safety Responses Automation ??| Segment Anything with Lightweight Model ??|?? - Release #9

2024年5月25日

Safety Responses Automation ??| Segment Anything with Lightweight Model ??|?? - Release #9

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…

1 条评论
The first releases of Code LLM - Code Intelligence Breakdown | Multi-Program Synthesis

2024年5月24日

The first releases of Code LLM - Code Intelligence Breakdown | Multi-Program Synthesis

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…
Predecessor of Phi3 ????- Textbook Are All You Need ??| Speech-to-Speech ????Translation with Monolingual Data ??

2024年5月12日

Predecessor of Phi3 ????- Textbook Are All You Need ??| Speech-to-Speech ????Translation with Monolingual Data ??

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…

1 条评论
Magician behind Coding ????♂???♀?| SLMs are the best? ??♂???♀?

2024年4月30日

Magician behind Coding ????♂???♀?| SLMs are the best? ??♂???♀?

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…
Interbreeding Camels ????Version 2 - Camels in a Changing Climate

2024年4月24日

Interbreeding Camels ????Version 2 - Camels in a Changing Climate

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…
Refresh LLMs with SE Data ?? | Interbreeding of Camels ??

2024年4月23日

Refresh LLMs with SE Data ?? | Interbreeding of Camels ??

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…

1 条评论
Fine Tune LLMs - Don't go for Billion Parameters ??

2024年4月13日

Fine Tune LLMs - Don't go for Billion Parameters ??

Hello All, This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI &…

2 条评论
Focusing on Attention and Hallucinations

2024年4月10日

Focusing on Attention and Hallucinations

Hello All ???? This is Raghul Gopal, an AWS Community Builder (ML & GenAI), a Research freak who is an enthusiast in AI…

See all articles

Learn First Multimodal LLM without Trouble and Perfect Medical LLM for Medicinal Research

Raghul Gopal

Data Science at Logitech | AWS Community Builder ??(ML & GenAI) | Talks and Writes about AI, AGI & Cloud Deployments of AI & AGI | Public Speaker ??| Blogger ??| Unlocking Data Secrets with Math & AI ????

领英推荐

Learn with Me

1,505 位关注者

Raghul Gopal的更多文章

社区洞察

其他会员也浏览了

Top Smart Algorithms In Healthcare

A new test for inventive step?

How AI and Healthcare Go Hand In Hand?

Koombea Tech Bytes | Feb 4, 2025

Revolutionizing Brain-Computer Interfaces: The Transformative Impact of Advanced AI with Reasoning Capabilities (OpenAI o1/o3) Across Functional Areas

A Physician’s Visual Guide To Artificial Intelligence

Of Algorithms and Minds: Navigating the AI-Human Partnership #8 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

Unlocking LLM Potential with Memory Compression: ARM and RISC-V's Role

Your Daily AI tl;dr | 2022-05-28

Recent Advances in Artificial Intelligence: Curiosities and Trends

领英推荐

Learn with Me

1,505 位关注者

Raghul Gopal的更多文章

Attention as an RNN - Aaren ?? | Don't Memorize - Be like a Goldfish??to mitigate Memorization in LLMs ??

Mixed Modal FM ??- Chances of Llama 4 | Aya 23 - Successor of Aya 101 ???

Safety Responses Automation ??| Segment Anything with Lightweight Model ??|?? - Release #9

The first releases of Code LLM - Code Intelligence Breakdown | Multi-Program Synthesis

Predecessor of Phi3 ????- Textbook Are All You Need ??| Speech-to-Speech ????Translation with Monolingual Data ??

Magician behind Coding ????♂???♀?| SLMs are the best? ??♂???♀?

Interbreeding Camels ????Version 2 - Camels in a Changing Climate

Refresh LLMs with SE Data ?? | Interbreeding of Camels ??

Fine Tune LLMs - Don't go for Billion Parameters ??

Focusing on Attention and Hallucinations

社区洞察

其他会员也浏览了

Top Smart Algorithms In Healthcare

A new test for inventive step?

How AI and Healthcare Go Hand In Hand?

Koombea Tech Bytes | Feb 4, 2025

Revolutionizing Brain-Computer Interfaces: The Transformative Impact of Advanced AI with Reasoning Capabilities (OpenAI o1/o3) Across Functional Areas

A Physician’s Visual Guide To Artificial Intelligence

Of Algorithms and Minds: Navigating the AI-Human Partnership #8 Exploring The Dynamic Synergy Between Artificial Intelligence And Humans

Unlocking LLM Potential with Memory Compression: ARM and RISC-V's Role

Your Daily AI tl;dr | 2022-05-28

Recent Advances in Artificial Intelligence: Curiosities and Trends