Using LLMs locally on iPad or iPhone

Using LLMs locally on iPad or iPhone

In this tutorial you will learn how to install a ChatGPT-like large language model (LLM) locally on your Apple device.

This step-by-step guide is based on my personal iPad Pro 5th Gen with M1 chip, 8GB of RAM and 128GB of local storage space. I will use llmfarm.site application as a client application to run a downloaded model. I will also select a pre-trained Mistral-7B model available on huggingface.co, although this guide will allow you to use probably any other model of your choice.

For the detailed description of the thought process and rationale behind decisions made, you can scroll to the bottom of this article.

Step 0: Prerequisites

  • An iPad or iPhone with at least 8GB of RAM
  • At least 8GB of free local storage

Step 1: Install Testflight and LLMFarm

To host our local LLM, we will use LLMFarm, an open source client with the support for Apple Silicon. Since LLMFarm is still in development, it is necessary to use Testflight app. Let's jump straight to LLMFarm website and click "Install with TestFlight" option.

Once you are redirected to TestFlight website, click on "View in App Store" to install TestFlight:

Install TestFlight as if you were installing any other app:

Now go back to TestFlight website and click on Step 2 to install LLMFarm:

Step 2: Download a pre-trained model

For the purpose of this tutorial, I will select a pre-trained Mistral-7B model available on huggingface.co. If you want to read further about this model, please scroll down to the bottom of this article. Otherwise, go to TheBloke repository hosting a ready-to-use model: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF

Select mistral-7b-instruct-v0.1.Q4_K_M.gguf file from the list of available models and click download. Please note that the model will require around 4GB of free space on your device:

Check the location of the model file and note it down as you will need to point LLMFarm to this location in the next step:

Step 3: Set up LLMFarm to use Mistral-7B model

Switch back to LLMFarm application, click on Settings option in the bottom of the screen and then select "Models" option on the left:

Point the application to the location where the model was downloaded. Please note that LLMFarm will copy the model to its own folder which means that it will take another 4GB of free space on your device. Once select the model, you will be able to delete the file from its original location, though.

Once the model is uploaded, it should appear on the list as on the screenshot below:

Step 4: Configure the chat:

We are now ready to set up a chat context window which will allow us to interact with our model. Click on the Chats option in the bottom left of LLMFarm application and then select "Start New Chat" option:

In the chat settings, click "Select model":

Then, select "Import from file" option and select the model which we imported earlier to LLMFarm library:

Now let's refine prompt formatting. Go to "Prompt format" option, remove the default entry and add the following line as per Mistral documentation:

<s>[INST] {{prompt}} [/INST]        

We're nearly ready to go! The last thing to do is to tweak settings related to the resource management. Click on "Prediction options" and select the following settings:

  • Turn on "Metal" to leverage Apple Silicon
  • Turn on "MLock" and leave MMap option on for more efficient RAM management

Once your settings are set up, click on "Add" option in the top of the screen:

Step 5: Testing!

Your chat window should now be ready to go. You can start giving tasks, ask questions and check for accuracy of results:

Please note that during the first prompt, the warmup period may take some time. Afterwards, the response should be faster:

That's it, congratulations! Now you know how to run large language models (LLMs) locally on your Apple device.

Why do I write this tutorial?

Picture this: you're in the era of all things smart and digital, where AI's prowess is as common as your morning cup of coffee. Enter LLMs, these brainy language models that can chat, generate text, and even write Shakespearean sonnets if you ask them nicely. But here's the kicker: instead of relying on some closed-source model residing in an unknown location, why not bring this wizardry closer to home and use it offline? Yep, right into the cozy confines of your own device!

Privacy buffs, rejoice! Hosting an LLM on your local gadget is like waving a magic wand over your data. No more fretting about your conversations being overheard by unseen digital ears. It's like having your own secret language lair where only you and your LLM pal share the juiciest details of your chat without nosy third parties peeking in.

Why iPad or iPhone and not a Macbook?

When it comes to being the featherweight champion of portability, iPads and new iPhone 15 Pro Max series step into the ring with their slick moves and compact charm, while MacBooks bring the heavyweight power. When it comes down to portability, iPads and iPhones are the go-to choice for those always on the move, craving versatility, and wanting a device that's as light as a feather. Last but not least, LLMs running locally on your iDevices finally become a good use case to squeeze the juice out of them!

Why Mistral-7B?

Mistral-7B from MistralAI is widely praised for its impressive performance in comparison to its relatively small size. As per the claims of its creators, this foundational model handles tasks better than some larger models while not requiring so much computing power. This is particularly important since even the beefiest iPads and iPhones are constrained with the available RAM. It also takes a much more permissive approach when it comes to accepting all kinds of tasks.

I tested several quantized models from TheBloke repository and I found that a model pre-trained at level 3 (mistral-7b-instruct-v0.1.Q3_K_M.gguf) or 4 (mistral-7b-instruct-v0.1.Q4_K_M.gguf) quantization retains more precision in the model's parameters, potentially preserving better accuracy while not requiring more than 8GB of RAM. Below you will find a screenshot of the memory usage while running the abovementioned models:


However, in the end, the accuracy of the model has to be determined by your subjective needs.


Yurii Ormson

Full-Stack QA Engineer | {Java & Playwright} | EPAM Systems

5 个月

Thanks for your effort! ??

回复
Dr. Zeeshan Alam

BDS-intern at NIMS DENTAL COLLEGE, NIMS UNIVERSITY| Python- web development with Django| Algo-trading and investing.

6 个月

Now you don’t even need to download testflight, LLM Farm is available in the app store and there are models to download directly from it

回复

This works even on the iPhone 14 which only has 6GB RAM. Amazing stuff.

回复
Jonathan Tismo

R & D Engineer at Nokia

1 年

Hi, I noticed the UI had changed a lot since from the YT video ( https://youtu.be/5QEDNZlDf-c?si=eKfXv_9mwWgJmEhI ) I’ve used to install and confifure your app. Basically the plus sign on the upper right and the setting button at the bottom are no longer there, which is fine if I want to just use 1 model for all my chat threads but if I want to use different models I will have to remove all chats first so that I could add or select a different model. I am using an 11” Ipad Pro M1 with 16GB RAM 1TB

回复
David Alan Birdwell

Founder - Humanity and AI, LLC

1 年

So this is working great with the Q4 model on an 11” M1 iPad Pro. However, the responses often get truncated after three short paragraphs, and I get all kinds of claims of it being able to use email and instant messaging accounts to interact with users, but obviously no nominal ability to do so.

要查看或添加评论,请登录

Maciek J?drzejczyk的更多文章

社区洞察

其他会员也浏览了