On-Device LLM - Future is EDGE AI

On-Device LLM - Future is EDGE AI

Large language models (LLMs) are central to modern natural language processing, delivering exceptional performance in various tasks.

LLMs can understand human language and handle most language-based ML tasks with a huge knowledge base, e.g., language translation, Q&A and smart reply.

LLM catalyzes mobile applications such as UI automation tasks based on user instructions, e.g., “forward the recent 3 emails

LLM marks a giant step for mobile devices towards more intelligent and personalized assistive agent

Local LLM use cases is growing by Matt Rickard

Some interesting properties of on-device AI:

  • Decentralizes the cost of serving the models. This opens up a class of use cases that wouldn’t be economically feasible to serve.
  • Smaller models are quicker to iterate on. More developers can experiment with them.
  • Better fit for specific modalities (e.g., speech-to-text).
  • The incentive for certain companies to ship this (i.e., Apple, other hardware companies).

The benefits of serverside inference

  • Economies of scale to hosting models -- parameters can be loaded once and serve larger batch sizes, amortizing the cost.
  • Online training and augmentation. Incorporate new data via a search index or other data source.
  • Fundamental limits on chips and RAM mean huge models can’t be served locally. Cloud is elastic

LLM as a System Service on Mobile Devices

LLM-as-a-Service (LLMaaS) will be a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS) as proposed by Wangsong Yin, Mengwei Xu3, Yuanchun Li, Xuanzhe Liu in their research paper

To fully explore the design space of chunk-level memory management, LLMS incorporates three novel techniques.

(1) Tolerance-Aware Compression

(2) Swapping-Recompute Pipeline

(3) Chunk Lifecycle Management

LLMaaS with LLMS, a concrete LLM Service design based on fine-grained, chunk-wise KV cache compression and swapping.

LLMS, which decouples the memory management of LLM contexts from the app. LLMS aims to minimize the LLM context switching overhead under a tight memory budget, akin to the traditional mobile memory mechanisms that focus on reducing the app cold-start latency.

Apple introduced their Focus on Running LLM on Apple Devices ( Gemini and OpenAI)

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Lot of challenges in running large language models (LLMs) on devices with constrained memory capacities. Apple Mentioned in Research that their approach Deeply rooted in the understanding of flash memory and DRAM characteristics, represents a novel convergence of hardware-aware strategies and machine learning. By developing an inference cost model that aligns with these hardware constraints, we have introduced two innovative techniques: ’windowing’ and ’row-column bundling.

Running large language models (LLMs) on the edge is useful: MIT Introduced TinyChat Engine

Copilot services (coding, office, smart reply) on laptops, cars, robots, and more. Users can get instant responses with better privacy, as the data is local.

This is enabled by the LLM model compression technique: SmoothQuant and AWQ (Activation-aware Weight Quantization), co-designed with TinyChatEngine that implements the compressed low-precision model.

https://mit-han-lab.github.io/TinyChatEngine/
https://mit-han-lab.github.io/TinyChatEngine/
https://mit-han-lab.github.io/TinyChatEngine/


Generative AI models under 10 billion parameters - EDGE devices,

Qualcomm

Microsoft Introduced their AI Model ORCA

https://www.microsoft.com/en-us/research/project/orca/

Microsoft has introduced a new AI model called Phi-3-mini, which is a smaller version of its Phi-3 language model.

It is designed to provide advanced language, coding and math capabilities to smaller businesses and organizations

Phi-3-mini AI model can be applied in content creation, local processing, everyday tasks, chatbots, math problem solving, custom A

https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/

References



Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

6 个月

Exciting developments in AI models for businesses. Navay Singh Gill

要查看或添加评论,请登录

Navdeep Singh Gill的更多文章

社区洞察

其他会员也浏览了