LlamaPi Robot Updated with Llama-3.2

LlamaPi Robot Updated with Llama-3.2

In the previous post, LlamaPi Robot was backed by Llama-3.1 8B. Due to the limited performance of Raspberry Pi CPU, generation speed of local LLM was ~1.8 tokens/second which substantially impacted the user experience.

The recent release of Llama-3.2 featuring 1B/3B models has opened up new opportunities for on-device AI. Latest version of LlamaPi has adopted Llama-3.2 by default. Using 5-bit quantization, generation speed reached ~3.3 tokens/second, which was a lot faster than the previous version. It also allows me to try more sophisticated system prompts.

Here is a comparison of the generation speed with different models (tested with the CLI from llama.cpp):

There is still a long way to go to achieve my target (10 tps), but boosting from 1.8 tps to 3.3 tps is a good start. To achieve the 10 tps goal, getting VideoCore GPU and Vulkan backend to work might be a logical next step.

Charlie Hou (Hiring)

Ai Enthusiastic | ex Mellanox/Nvidia Networking | ex Huawei

1 个月

Very nice! In previous AI architectures, computationally intensive tasks were often offloaded to the cloud, particularly utilizing Edge AI Computing. Are there any current solutions offering similar capabilities?

回复

要查看或添加评论,请登录

Ping Zhou的更多文章

社区洞察

其他会员也浏览了