LlamaPi Robot Updated with Llama-3.2
In the previous post, LlamaPi Robot was backed by Llama-3.1 8B. Due to the limited performance of Raspberry Pi CPU, generation speed of local LLM was ~1.8 tokens/second which substantially impacted the user experience.
The recent release of Llama-3.2 featuring 1B/3B models has opened up new opportunities for on-device AI. Latest version of LlamaPi has adopted Llama-3.2 by default. Using 5-bit quantization, generation speed reached ~3.3 tokens/second, which was a lot faster than the previous version. It also allows me to try more sophisticated system prompts.
Here is a comparison of the generation speed with different models (tested with the CLI from llama.cpp):
There is still a long way to go to achieve my target (10 tps), but boosting from 1.8 tps to 3.3 tps is a good start. To achieve the 10 tps goal, getting VideoCore GPU and Vulkan backend to work might be a logical next step.
Ai Enthusiastic | ex Mellanox/Nvidia Networking | ex Huawei
1 个月Very nice! In previous AI architectures, computationally intensive tasks were often offloaded to the cloud, particularly utilizing Edge AI Computing. Are there any current solutions offering similar capabilities?