登录查看更多内容

LlamaPi Robot Updated with Llama-3.2

Ping Zhou

AI Infra, LLM, Cloud & Edge, HW/SW co-design

发布日期: 2024年10月2日

In the previous post, LlamaPi Robot was backed by Llama-3.1 8B. Due to the limited performance of Raspberry Pi CPU, generation speed of local LLM was ~1.8 tokens/second which substantially impacted the user experience.

The recent release of Llama-3.2 featuring 1B/3B models has opened up new opportunities for on-device AI. Latest version of LlamaPi has adopted Llama-3.2 by default. Using 5-bit quantization, generation speed reached ~3.3 tokens/second, which was a lot faster than the previous version. It also allows me to try more sophisticated system prompts.

Here is a comparison of the generation speed with different models (tested with the CLI from llama.cpp):

There is still a long way to go to achieve my target (10 tps), but boosting from 1.8 tps to 3.3 tps is a good start. To achieve the 10 tps goal, getting VideoCore GPU and Vulkan backend to work might be a logical next step.

Charlie Hou

Ai Enthusiastic | ex Mellanox/Nvidia Networking

5 个月

Very nice！ In previous AI architectures, computationally intensive tasks were often offloaded to the cloud, particularly utilizing Edge AI Computing. Are there any current solutions offering similar capabilities?

要查看或添加评论，请登录

Ping Zhou的更多文章

Building LlamaPi Robot - Challenges and Takeaways

2024年9月21日

Building LlamaPi Robot - Challenges and Takeaways

Introduction Recently I created a project LlamaPi that demonstrates Voice + LLM + Robotics on a low-power device…

2 条评论
SanGuo GPT - Update 9/17/2023

2023年9月18日

SanGuo GPT - Update 9/17/2023

Not much update recently, just a few minor changes. Calculate perplexity in training and generation.
Quantum Machine Learning - Getting Started with TensorFlow Quantum

2020年11月23日

Quantum Machine Learning - Getting Started with TensorFlow Quantum

Earlier this year, Google announced TensorFlow Quantum, a framework for building Hybrid Quantum-Classical Machine…
Alibaba Open Channel SSD!

2018年7月6日

Alibaba Open Channel SSD!
Alibaba committed to use Intel Optane SSD

2017年3月28日

Alibaba committed to use Intel Optane SSD

From academic research to product development, and to deployment in world-class infrastructure..
Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

2016年4月23日

Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

Saw this article online recently: Intel made a huge mistake 10 years ago. Now 12,000 workers are paying the price.

See all articles

LlamaPi Robot Updated with Llama-3.2

Ping Zhou

AI Infra, LLM, Cloud & Edge, HW/SW co-design

Ping Zhou的更多文章

社区洞察

其他会员也浏览了

The Million-Dollar Trick: LLAMA 3.1 is Free to Own, Costly to Run

PREVIEW: Intuitive daVinci Gen 5 - so what's expected

What is the hardware (cost) to fine tune an AI Model? A comparison of various models to date.

NVidia GTC 2024 Keynote Session

Product of the Week: NVIDIA Jetson Orin Nano Developer Kit

AI, Quantum Computing, and Industry Disruptors Shaping the Future

Power line Maintenance - Implementing YOLO in NVIDIA DeepStream

NewMind AI Journal #8

Significantly enhanced usability and security features in newest version of EdgeX, the industry’s leading open-source edge data platform

Excel at the Edge with the MXM-ACMA-PUC

Ping Zhou的更多文章

Building LlamaPi Robot - Challenges and Takeaways

SanGuo GPT - Update 9/17/2023

Quantum Machine Learning - Getting Started with TensorFlow Quantum

Alibaba Open Channel SSD!

Alibaba committed to use Intel Optane SSD

Still remember the time when I was asked to port U-boot for Apple, and how Intel missed the opportunity

社区洞察

其他会员也浏览了

The Million-Dollar Trick: LLAMA 3.1 is Free to Own, Costly to Run

PREVIEW: Intuitive daVinci Gen 5 - so what's expected

What is the hardware (cost) to fine tune an AI Model? A comparison of various models to date.

NVidia GTC 2024 Keynote Session

Product of the Week: NVIDIA Jetson Orin Nano Developer Kit

AI, Quantum Computing, and Industry Disruptors Shaping the Future

Power line Maintenance - Implementing YOLO in NVIDIA DeepStream

NewMind AI Journal #8

Significantly enhanced usability and security features in newest version of EdgeX, the industry’s leading open-source edge data platform

Excel at the Edge with the MXM-ACMA-PUC