登录查看更多内容

On-Device LLM - Future is EDGE AI

Navdeep Singh Gill

Building XenonStack | Agentic AI | Physical AI | Data Foundry | AGI and Quantum Futurist | Author | Speaker

发布日期: 2024年4月28日

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks.

LLMs can understand human language and handle most language-based ML tasks with a huge knowledge base, e.g., language translation, Q&A and smart reply.

LLM catalyzes mobile applications such as UI automation tasks based on user instructions, e.g., “forward the recent 3 emails

LLM marks a giant step for mobile devices towards more intelligent and personalized assistive agent

Local LLM use cases is growing by Matt Rickard

Some interesting properties of on-device AI:

Decentralizes the cost of serving the models. This opens up a class of use cases that wouldn’t be economically feasible to serve.
Smaller models are quicker to iterate on. More developers can experiment with them.
Better fit for specific modalities (e.g., speech-to-text).
The incentive for certain companies to ship this (i.e., Apple, other hardware companies).

The benefits of serverside inference

Economies of scale to hosting models -- parameters can be loaded once and serve larger batch sizes, amortizing the cost.
Online training and augmentation. Incorporate new data via a search index or other data source.
Fundamental limits on chips and RAM mean huge models can’t be served locally. Cloud is elastic

LLM as a System Service on Mobile Devices

LLM-as-a-Service (LLMaaS) will be a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS) as proposed by Wangsong Yin, Mengwei Xu3, Yuanchun Li, Xuanzhe Liu in their research paper

To fully explore the design space of chunk-level memory management, LLMS incorporates three novel techniques.

(1) Tolerance-Aware Compression

(2) Swapping-Recompute Pipeline

(3) Chunk Lifecycle Management

LLMaaS with LLMS, a concrete LLM Service design based on fine-grained, chunk-wise KV cache compression and swapping.

LLMS, which decouples the memory management of LLM contexts from the app. LLMS aims to minimize the LLM context switching overhead under a tight memory budget, akin to the traditional mobile memory mechanisms that focus on reducing the app cold-start latency.

Apple introduced their Focus on Running LLM on Apple Devices ( Gemini and OpenAI)

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Lot of challenges in running large language models (LLMs) on devices with constrained memory capacities. Apple Mentioned in Research that their approach Deeply rooted in the understanding of flash memory and DRAM characteristics, represents a novel convergence of hardware-aware strategies and machine learning. By developing an inference cost model that aligns with these hardware constraints, we have introduced two innovative techniques: ’windowing’ and ’row-column bundling.

Running large language models (LLMs) on the edge is useful: MIT Introduced TinyChat Engine

Copilot services (coding, office, smart reply) on laptops, cars, robots, and more. Users can get instant responses with better privacy, as the data is local.

Pascal Biese 3 个月前

How to get more out of LLMs

Stefan Huyghe 1 年前

SLM and LLM... My Top 10 in July 2024

?? Fabrizio Degni 4 个月前

This is enabled by the LLM model compression technique: SmoothQuant and AWQ (Activation-aware Weight Quantization), co-designed with TinyChatEngine that implements the compressed low-precision model.

https://mit-han-lab.github.io/TinyChatEngine/

Generative AI models under 10 billion parameters - EDGE devices,

Microsoft Introduced their AI Model ORCA

https://www.microsoft.com/en-us/research/project/orca/

Microsoft has introduced a new AI model called Phi-3-mini, which is a smaller version of its Phi-3 language model.

It is designed to provide advanced language, coding and math capabilities to smaller businesses and organizations

Phi-3-mini AI model can be applied in content creation, local processing, everyday tasks, chatbots, math problem solving, custom A

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

References

Neural QA

4,097 位关注者

Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

6 个月

Exciting developments in AI models for businesses. Navay Singh Gill

1 次回应

要查看或添加评论，请登录

Navdeep Singh Gill的更多文章

Multi-Agent System and Autonomous Agents - Next Frontier of Generative AI

2024年7月30日

Multi-Agent System and Autonomous Agents - Next Frontier of Generative AI

We are transitioning from an era of knowledge-oriented, general AI-powered tools such as chatbots designed for…
How to Pilot Generative AI in your Enterprise

2024年7月17日

How to Pilot Generative AI in your Enterprise

Organizations are planning or in the process of running PoCs for generative AI. According to the MIT Technology Review,…

1 条评论
Top Edge AI Trends in 2024

2024年5月22日

Top Edge AI Trends in 2024

The main trends highlighted at the Embedded world 2024 included the rise of Edge AI computing, growing demand for…
Put AI for Decision-Making into Practice - Decision Intelligence

2024年4月12日

Put AI for Decision-Making into Practice - Decision Intelligence

Businesses are looking to get a higher return out of artificial intelligence (AI) and machine learning (ML) than just…

1 条评论
Emergence of Small Language Models

2024年2月20日

Emergence of Small Language Models

Today, large language or Foundation Models (FMs) represent one of the most powerful new ways to build AI models;…
Building AI Factories

2024年2月13日

Building AI Factories

“Every Country needs to own the production of their own intelligence “ - Jensen Huang Taking things forward, Every…
Generative Agent for Insights Discovery and Knowledge Management

2024年2月8日

Generative Agent for Insights Discovery and Knowledge Management

Team working from last 2-3 Months to launch Generative Agents for enterprise Data and Building Private LLM with…

2 条评论
Generative AI on the Edge

2024年2月7日

Generative AI on the Edge

Intelligence is moving towards edge devices. Increased computing power and sensor data along with improved AI…

1 条评论
Race to Build Your Own AI Copilot: Era of Cognitive Plumbing

2024年1月30日

Race to Build Your Own AI Copilot: Era of Cognitive Plumbing

The race to embed advanced AI capabilities into products is on! Product “copilots” are the new norm, enabling natural…
POC to Production of Generative AI Applications

2024年1月22日

POC to Production of Generative AI Applications

Today, many organisations are looking to integrate Generative AI Applications to achieve greater business value from…

See all articles

On-Device LLM - Future is EDGE AI

Navdeep Singh Gill

Building XenonStack | Agentic AI | Physical AI | Data Foundry | AGI and Quantum Futurist | Author | Speaker

LLM as a System Service on Mobile Devices

Apple introduced their Focus on Running LLM on Apple Devices ( Gemini and OpenAI)

Running large language models (LLMs) on the edge is useful: MIT Introduced TinyChat Engine

领英推荐

Generative AI models under 10 billion parameters - EDGE devices,

Neural QA

4,097 位关注者

Navdeep Singh Gill的更多文章

社区洞察

其他会员也浏览了

AutoGen: Empowering Large Language Models — Simplified

Crafting Intelligence: The Art of Tailoring Large Language Models for Precision and Relevance

The Role of Large Language Models (LLMs) in Patent Novelty Checks

How Gemini Pro 1.5 Predicts Your Next Move

How to prompt like a pro: Why do different language models react differently?

Llama 3 and More: Unveiling AI Advances in Language, Vision, and Audio

Catastrophic Forgetting in LLMs

The Power and Promise of Large Language Models: Unlocking the Next Frontier of Artificial Intelligence

Understanding Large Language Models (LLMs) and Small Language Models (SLMs): A Shift Towards Efficiency

Pioneering GenAI: my vision for a sustainable future with hybrid Language Models

LLM as a System Service on Mobile Devices

Apple introduced their Focus on Running LLM on Apple Devices ( Gemini and OpenAI)

Running large language models (LLMs) on the edge is useful: MIT Introduced TinyChat Engine

领英推荐

Generative AI models under 10 billion parameters - EDGE devices,

Neural QA

4,097 位关注者

Navdeep Singh Gill的更多文章

Multi-Agent System and Autonomous Agents - Next Frontier of Generative AI

How to Pilot Generative AI in your Enterprise

Top Edge AI Trends in 2024

Put AI for Decision-Making into Practice - Decision Intelligence

Emergence of Small Language Models

Building AI Factories

Generative Agent for Insights Discovery and Knowledge Management

Generative AI on the Edge

Race to Build Your Own AI Copilot: Era of Cognitive Plumbing

POC to Production of Generative AI Applications

社区洞察

其他会员也浏览了

AutoGen: Empowering Large Language Models — Simplified

Crafting Intelligence: The Art of Tailoring Large Language Models for Precision and Relevance

The Role of Large Language Models (LLMs) in Patent Novelty Checks

How Gemini Pro 1.5 Predicts Your Next Move

How to prompt like a pro: Why do different language models react differently?

Llama 3 and More: Unveiling AI Advances in Language, Vision, and Audio

Catastrophic Forgetting in LLMs

The Power and Promise of Large Language Models: Unlocking the Next Frontier of Artificial Intelligence

Understanding Large Language Models (LLMs) and Small Language Models (SLMs): A Shift Towards Efficiency

Pioneering GenAI: my vision for a sustainable future with hybrid Language Models