登录查看更多内容

Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

Aditi Khare

AWS & AI Research [LLMs & Vision]-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | Inference Optimization | Hyperspectral Imaging | Open-Source Dev | Build Production-Grade AI Products from Scratch

发布日期: 2024年9月28日

+ 关注

#ai #airesearch #meta #llm #genai #vision

Meta has released Llama 3.2 - A small and medium-sized vision LLMs (11B and 90B), and lightweight, text-only models (1B and 3B) that fit onto edge and mobile devices, including pre-trained and instruction-tuned versions.
Llama 3.2 1B and 3B models support context length of 128K tokens and are state-of-the-art in their class for on-device use cases like summarization, instruction following, and rewriting tasks running locally at the edge. These models are enabled on day one for Qualcomm and MediaTek hardware and optimized for Arm processors.
Llama 3.2 11B and 90B vision models are drop-in replacements for their corresponding text model equivalents, while exceeding on image understanding tasks compared to closed models, such as Claude 3 Haiku. Unlike other open multimodal models, both pre-trained and aligned models are available to be fine-tuned for custom applications using torchtune and deployed locally using torchchat.
Llama Stack Distributions, which will greatly simplify the way developers work with Llama models in different environments, including single-node, on-prem, cloud, and on-device, enabling turnkey deployment of retrieval-augmented generation (RAG) and tooling-enabled applications with integrated safety.
Meta has been working with thier partners - AWS, Databricks, Dell Technologies, Fireworks, Infosys, and Together AI to build Llama Stack distributions for their downstream enterprise clients. On-device distribution is via PyTorch ExecuTorch, and single-node distribution is via Ollama.
Llama is already leading the way on openness, modifiability, and cost efficiency—enabling more people to have creative, useful, and life-changing breakthroughs using generative AI.
Llama 3.2 models are now available for download on llama.com and Hugging Face, as well as available for immediate development on our broad ecosystem of partner platforms, including AMD, AWS, Databricks, Dell, Google Cloud, Groq, IBM, Intel, Microsoft Azure, NVIDIA, Oracle Cloud, Snowflake, and more.

Model evaluations -

Model Evaluation suggests that the Llama 3.2 vision models are competitive with leading foundation models, Claude 3 Haiku and GPT4o-mini on image recognition and a range of visual understanding tasks. The 3B model outperforms the Gemma 2 2.6B and Phi 3.5-mini models on tasks such as following instructions, summarization, prompt rewriting, and tool-use, while the 1B is competitive with Gemma.

This has been Evaluated performance on over 150 benchmark datasets that span a wide range of languages. For the vision LLMs has been evaluated performance on benchmarks for image understanding and visual reasoning.

Vision Models -

As the first Llama models to support vision tasks, the 11B and 90B models required an entirely new model architecture that supports image reasoning.

To add image input support, it is trained a set of adapter weights that integrate the pre-trained image encoder into the pre-trained language model. The adapter consists of a series of cross-attention layers that feed image encoder representations into the language model. We trained the adapter on text-image pairs to align the image representations with the language representations. During adapter training, we also updated the parameters of the image encoder, but intentionally did not update the language-model parameters. By doing that, we keep all the text-only capabilities intact, providing developers a drop-in replacement for Llama 3.1 models.

The training pipeline consists of multiple stages, starting from pretrained Llama 3.1 text models. First, we add image adapters and encoders, then pretrain on large-scale noisy (image, text) pair data. Next, we train on medium-scale high quality in-domain and knowledge-enhanced (image, text) pair data.

In post-training, we use a similar recipe as the text models by doing several rounds of alignment on supervised fine-tuning, rejection sampling, and direct preference optimization. We leverage synthetic data generation by using the Llama 3.1 model to filter and augment question and answers on top of in-domain images, and use a reward model to rank all the candidate answers to provide high quality fine-tuning data. We also add safety mitigation data to produce a model with a high level of safety while retaining helpfulness of the mode

The end result is a set of models that can take in both image and text prompts, and deeply understand and reason on the combination. This is another step toward Llama models having even richer agentic capabilities.

Lightweight Models -

2 Methods - Pruning & Distillation on the 1B and 3B models, making them the first highly capable lightweight Llama models that can fit on devices efficiently.

Pruning enables us to reduce the size of extant models in the Llama herd while recovering as much knowledge and performance as possible. For the 1B and 3B models, we took the approach of using structured pruning in a single shot manner from the Llama 3.1 8B. This involved systematically removing parts of the network and adjusting the magnitude of the weights and gradients to create a smaller, more efficient model that retains the performance of the original network.

Knowledge distillation uses a larger network to impart knowledge on a smaller network, with the idea that a smaller model can achieve better performance using a teacher than it could from scratch. For the 1B and 3B in Llama 3.2, we incorporated logits from the Llama 3.1 8B and 70B models into the pre-training stage of the model development, where outputs (logits) from these larger models were used as token-level targets. Knowledge distillation was used after pruning to recover performance.

In post-training, we use a similar recipe as Llama 3.1 and produce final chat models by doing several rounds of alignment on top of the pre-trained model. Each round involves supervised fine-tuning (SFT), rejection sampling (RS), and direct preference optimization (DPO).

领英推荐

Artificial General Intelligence (AGI): Explained

Blockchain Council 1 年前

Scale AI CEO says China has quickly caught up with the…

CNBC International 1 个月前

Gen AI for Business Newsletter, edition #25

Eugina Jordan 5 个月前

In post-training, we scale context length support to 128K tokens, while maintaining the same quality as the pre-trained model. Also synthetic data generation that goes through careful data processing and filtering to ensure high quality. We carefully blend the data to optimize for high quality across multiple capabilities like summarization, rewriting, instruction following, language reasoning, and tool use.

Llama Stack distributions -

Llama CLI (command line interface) to build, configure, and run Llama Stack distributions
Client code in multiple languages, including python, node, kotlin, and swift
Docker containers for Llama Stack Distribution Server and Agents API Provider
Multiple distributionsSingle-node Llama Stack Distribution via Meta internal implementation and OllamaCloud Llama Stack distributions via AWS, Databricks, Fireworks, and TogetherOn-device Llama Stack Distribution on iOS implemented via PyTorch ExecuTorchOn-prem Llama Stack Distribution supported by Dell.

System Level Safety -

First, meta is releasing Llama Guard 3 11B Vision, which is designed to support Llama 3.2’s new image understanding capability and filter text+image input prompts or text output responses to these prompts.
Second, as they released 1B and 3B Llama models to be used in more constrained environments like on-device, we also optimized Llama Guard to drastically reduce its deployment cost. Llama Guard 3 1B is based on the Llama 3.2 1B model and has been pruned and quantized bringing its size from 2,858 MB down to 438 MB, making it more efficient than ever to deploy.

Try Meta's multimodal vision and lightweight models in Amazon Bedrock:

https://aws.amazon.com/blogs/aws/introducing-llama-3-2-models-from-meta-in-amazon-bedrock-a-new-generation-of-multimodal-vision-and-lightweight-models/

References -

Reference Reading Link - https://www.llama.com/

Hugging Face Link - https://huggingface.co/meta-llama

For more information on AI Research Papers you can visit my Github Profile -

https://github.com/aditikhare007/AI_Research_Junction_Aditi_Khare

For Receving latest updates on Advancements in AI Research Gen-AI, Quantum AI & Computer Vision you can subscribe to my AI Research Papers Summaries Newsletter using below link -

https://www.dhirubhai.net/newsletters/7152631955203739649/

Thank you & Happy Reading !!

AI Research Junction

1,662 位关注者

要查看或添加评论，请登录

Aditi Khare的更多文章

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

2025年1月26日

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

#ai #genai #research #researchpapers #llm #inference LLM Inference-Time Self-Improvement - LLM Inference-Time Self…

1 条评论
OpenAI's AI Powered Search Engine Into ChatGPT

2024年11月1日

OpenAI's AI Powered Search Engine Into ChatGPT

#ai #searchgpt #airesearch #genai Introducing ChatGPT Search - ChatGPT can now search the web in a much better way than…
Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

2024年10月23日

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

#ai #airesearchpapers #genai #claude #anthropic For more information on AI Research Papers you can visit my Github…
OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

2024年10月12日

OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

#openai #ai #airesearch #airesearchpapers #researchskills For more information on AI Research Papers you can visit my…
Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

2024年10月7日

Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

#ai #genai #architecture #search #researchpapers #researchskills #computervision #pattern recognition Inference-time…
Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

2024年9月24日

Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

#ai #airesearch #genai #researchskills Agents in Software Engineering: Survey, Landscape, and Vision - Large Language…
Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

2024年9月23日

Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

#ai #airesearch #anthropic #embeddings #llm #genai Introducing Contextual Retrieval - Developers typically enhance an…
Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

2024年9月22日

Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

#ai #airesearch #airesearchpapers #genai #rl #llm Google's Training Language Models to Self-Correct via Reinforcement…
Learning to Reason with LLMs - Introducing OpenAI o1

2024年9月14日

Learning to Reason with LLMs - Introducing OpenAI o1

#ai #openai #llms #genai #airesearch #airesearchskills #airesearchpapers Introducing OpenAI o1-Preview - A new series…

1 条评论
LongCite - Enabling LLMs to Generate Fine-grained Citations in Long-context QA

2024年9月10日

LongCite - Enabling LLMs to Generate Fine-grained Citations in Long-context QA

#ai #airesearchskills #airesearch #genai #llms Though current long-context large language models (LLMs) have…

See all articles

Meta's Llama 3.2 - Edge AI & Vision with Open, Customizable Models

Aditi Khare

AWS & AI Research [LLMs & Vision]-Principal Machine Learning Scientist & AI Architect | IIM-A | Author | Inference Optimization | Hyperspectral Imaging | Open-Source Dev | Build Production-Grade AI Products from Scratch

领英推荐

AI Research Junction

1,662 位关注者

Aditi Khare的更多文章

社区洞察

其他会员也浏览了

AI in 2023 a Look Back

Unleashing the Power of AI and ML??: Investigating the Latest Breakthroughs and Present?Trends?in?2024!

Latest AI Developments: From Enhanced Persuasion and Reasoning Capabilities to Groundbreaking Chips and Models

This Week in AI: Microsoft’s new AI tools, Tour de France’s Updated Viewing Experience, and Generative AI’s impact on Asia

Colossus AI: Elon Musk’s Latest Move and Its Impact on the AI Landscape

Shortform's AI Recap - July 22 2024

The Generative A.I. Brief Issue #5

Accenture invests $3 billion in AI

AI NEWS YOU MISSED ?#53

?? Daily News in AI Agents: Key Updates 01/05 - Generative AI at Work: Adobe Firefly, OpenAI’s GPT-4, and Nvidia’s Omniverse

领英推荐

AI Research Junction

1,662 位关注者

Aditi Khare的更多文章

LLM Inference-Time Self-Improvement & DeepSeek & Modern BERT

OpenAI's AI Powered Search Engine Into ChatGPT

Introducing Anthropic's Claude 3.5 Sonnet, and Claude 3.5 Haiku

OpenAI Introduces Swarm, a Framework for Building Multi-Agent Systems

Architecture Search Framework for Inference-Time Techniques & Designing Priors for Better Few-Shot Image Synthesis

Agents in Software Engineering-Survey, Landscape, and Vision & Qwen2.5-Coder

Anthropic Introduces Contextual Retrieval Using Prompt Caching & Contextual Embeddings & Reranking Techniques

Google's Training Language Models to Self-Correct via Reinforcement Learning & Iteration of Thought - Autonomous Large Language Model Reasoning

Learning to Reason with LLMs - Introducing OpenAI o1

LongCite - Enabling LLMs to Generate Fine-grained Citations in Long-context QA

社区洞察

其他会员也浏览了

AI in 2023 a Look Back

Unleashing the Power of AI and ML??: Investigating the Latest Breakthroughs and Present?Trends?in?2024!

Latest AI Developments: From Enhanced Persuasion and Reasoning Capabilities to Groundbreaking Chips and Models

This Week in AI: Microsoft’s new AI tools, Tour de France’s Updated Viewing Experience, and Generative AI’s impact on Asia

Colossus AI: Elon Musk’s Latest Move and Its Impact on the AI Landscape

Shortform's AI Recap - July 22 2024

The Generative A.I. Brief Issue #5

Accenture invests $3 billion in AI

AI NEWS YOU MISSED ?#53

?? Daily News in AI Agents: Key Updates 01/05 - Generative AI at Work: Adobe Firefly, OpenAI’s GPT-4, and Nvidia’s Omniverse