Using LLMs locally on iPad or iPhone
In this tutorial you will learn how to install a ChatGPT-like large language model (LLM) locally on your Apple device.
This step-by-step guide
For the detailed description of the thought process
Step 0: Prerequisites
Step 1: Install Testflight and LLMFarm
To host our local LLM, we will use LLMFarm, an open source client with the support for Apple Silicon. Since LLMFarm is still in development, it is necessary to use Testflight app. Let's jump straight to LLMFarm website and click "Install with TestFlight" option.
Once you are redirected to TestFlight website, click on "View in App Store" to install TestFlight:
Install TestFlight as if you were installing any other app:
Now go back to TestFlight website and click on Step 2 to install LLMFarm:
Step 2: Download a pre-trained model
For the purpose of this tutorial, I will select a pre-trained Mistral-7B model available on huggingface.co. If you want to read further about this model, please scroll down to the bottom of this article. Otherwise, go to TheBloke repository hosting a ready-to-use model: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF
Select mistral-7b-instruct-v0.1.Q4_K_M.gguf file from the list of available models and click download. Please note that the model will require around 4GB of free space on your device:
Check the location of the model file and note it down as you will need to point LLMFarm to this location in the next step:
Step 3: Set up LLMFarm to use Mistral-7B model
Switch back to LLMFarm application, click on Settings option in the bottom of the screen and then select "Models" option on the left:
Point the application to the location where the model was downloaded. Please note that LLMFarm will copy the model to its own folder which means that it will take another 4GB of free space on your device. Once select the model, you will be able to delete the file from its original location, though.
Once the model is uploaded, it should appear on the list as on the screenshot below:
Step 4: Configure the chat:
We are now ready to set up a chat context window which will allow us to interact with our model. Click on the Chats option in the bottom left of LLMFarm application and then select "Start New Chat" option:
领英推荐
In the chat settings, click "Select model":
Then, select "Import from file" option and select the model which we imported earlier to LLMFarm library:
Now let's refine prompt formatting. Go to "Prompt format" option, remove the default entry and add the following line as per Mistral documentation:
<s>[INST] {{prompt}} [/INST]
We're nearly ready to go! The last thing to do is to tweak settings related to the resource management
Once your settings are set up, click on "Add" option in the top of the screen:
Step 5: Testing!
Your chat window should now be ready to go. You can start giving tasks, ask questions and check for accuracy of results
Please note that during the first prompt, the warmup period may take some time. Afterwards, the response should be faster:
That's it, congratulations! Now you know how to run large language models (LLMs) locally on your Apple device.
Why do I write this tutorial?
Picture this: you're in the era of all things smart and digital, where AI's prowess is as common as your morning cup of coffee. Enter LLMs, these brainy language models that can chat, generate text, and even write Shakespearean sonnets if you ask them nicely. But here's the kicker: instead of relying on some closed-source model residing in an unknown location, why not bring this wizardry closer to home and use it offline? Yep, right into the cozy confines of your own device!
Privacy buffs, rejoice! Hosting an LLM on your local gadget is like waving a magic wand over your data. No more fretting about your conversations being overheard by unseen digital ears. It's like having your own secret language lair where only you and your LLM pal share the juiciest details of your chat without nosy third parties peeking in.
Why iPad or iPhone and not a Macbook?
When it comes to being the featherweight champion of portability, iPads and new iPhone 15 Pro Max series step into the ring with their slick moves and compact charm, while MacBooks bring the heavyweight power. When it comes down to portability, iPads and iPhones are the go-to choice for those always on the move, craving versatility, and wanting a device that's as light as a feather. Last but not least, LLMs running locally on your iDevices finally become a good use case to squeeze the juice out of them!
Why Mistral-7B?
Mistral-7B from MistralAI is widely praised for its impressive performance
I tested several quantized models from TheBloke repository and I found that a model pre-trained at level 3 (mistral-7b-instruct-v0.1.Q3_K_M.gguf) or 4 (mistral-7b-instruct-v0.1.Q4_K_M.gguf) quantization retains more precision in the model's parameters, potentially preserving better accuracy
However, in the end, the accuracy of the model has to be determined by your subjective needs.
Full-Stack QA Engineer | {Java & Playwright} | EPAM Systems
5 个月Thanks for your effort! ??
BDS-intern at NIMS DENTAL COLLEGE, NIMS UNIVERSITY| Python- web development with Django| Algo-trading and investing.
6 个月Now you don’t even need to download testflight, LLM Farm is available in the app store and there are models to download directly from it
--
8 个月This works even on the iPhone 14 which only has 6GB RAM. Amazing stuff.
R & D Engineer at Nokia
1 年Hi, I noticed the UI had changed a lot since from the YT video ( https://youtu.be/5QEDNZlDf-c?si=eKfXv_9mwWgJmEhI ) I’ve used to install and confifure your app. Basically the plus sign on the upper right and the setting button at the bottom are no longer there, which is fine if I want to just use 1 model for all my chat threads but if I want to use different models I will have to remove all chats first so that I could add or select a different model. I am using an 11” Ipad Pro M1 with 16GB RAM 1TB
Founder - Humanity and AI, LLC
1 年So this is working great with the Q4 model on an 11” M1 iPad Pro. However, the responses often get truncated after three short paragraphs, and I get all kinds of claims of it being able to use email and instant messaging accounts to interact with users, but obviously no nominal ability to do so.