(ChatGPT?) Large Language Model (LLaMa) in a Raspberry Pi 4b - Genie in a Bottle :-)

(ChatGPT?) Large Language Model (LLaMa) in a Raspberry Pi 4b - Genie in a Bottle :-)

We've at this point, all heard of #AI, #GenerativeAI and #LLMs - the idea here is to try and run a large language-based model generative Chat experience (similar to #ChatGPT) out of the least powerful portable device I can find - in this case, the Raspberry Pi 4b

Just for reference, while there are multiple ways of doing this, I wanted a model that would be

  • portable - hence the install footprint was small enough to carry around
  • did not need access to the Internet for it to work!

Which immediately threw out #OpenAI's API based models

Meta (God how I hate that name) or Facebook recently came out with their #LLaMa 2 language model that looked perfect, till you come across the size we had to deal with

You can find the documentation below

LLaMA 2 - Meta's Language Model

Request form for LLaMa models and libraries

While researching, I came across Sosaka works - essentially, they have been able to reduce the size of the language model from 7tb to 4 gigs, which meant that with some luck and trial and error, we might just be able to run the language models on a laptop, or in my case my Raspberry Cluster :-)

So let's get building. Hardware needed - clustered Raspberry Pi or even a single Raspberry Pi 4b (more memory and CPU cores, the better) and external storage - I am using a 闪迪 USB 3.2, 128 GB pen drive and a 闪迪 SDCard 32 gb for the OS

First thing is to install the base OS. I used the Raspberry Pi Imager to install a Ubuntu server headless install

Update the install and install essential build tools

sudo apt-get update
sudo apt install -y software-properties-common
sudo apt install -y build-essential
        






Next steps, mount the drives and download the model file from here or from Sosaka's link here




Clone the repo using the command below

sudo git clone https://github.com/antimatter15/alpaca.cpp        



Enter the directory alpaca and use the make command

cd alpaca.cpp/
sudo make chat        




Move the language model to the alpaca directory

sudo mv ../ggml-alpaca-7b-q4.bin .        



Start the chat and wait for the prompt to come up and fire off a couple of random questions :-)

sudo ./chat        


Open a parallel Putty/terminal and check the htop :-)



That's pretty wild usage :-) But then performance was never the goal!



Points to note:

  • As seen on htop, each run of the chat creates four processes that absolutely hammers the single Raspberry. It's better if three nodes are clustered
  • Chat responses are sluggish and will depend on the speed of your cards/USB drives/External drives that the language models are running from
  • This actually has voice output if drives and speakers are connected and sound is enabled - which sounds a lot better than the chat mode :-)

Happy tinkering! Let me know if you come across any other language models we can try out

要查看或添加评论,请登录

社区洞察

其他会员也浏览了