Running an LLM on Your Mac - The Missing Guide - Part 1
Forward (and warning)
This article is for those geeky enough to want to run an LLM via Swift and CoreML on their Mac as the first step on the journey to writing Mac, iOS and visionOS apps that talk directly to LLM models to power a new generation of capabilities. If you're not looking to build apps around LLMs, this probably isn't for you.
Apple turned a lot of heads (including mine) by showing a CoreML LLM running directly on device at WWDC this year, and I needed to dive in to see where my team can take it. We're helping clients unleash the potential of LLMs on device to leverage existing hardware investments (why spend on the cloud if you have dedicated hardware in your team's hands), keep data more secure and work offline in challenging environments.
Getting Started
"There has to be a better way!"
That was my reaction to mashing together partial sets of instructions, code commit notes and READMEs to get an LLM running natively on the Mac. There are a lot of places where one mistake can derail the whole thing. Since I couldn't find a guide like this, I created one with the end to end process that will get you running with a local LLM in about 10 minutes.
To make this work, I’m currently running Mac OS Sequoia 15.1 beta and Xcode: 16.1 beta 2 (16B5014f). I was not able to get it working with older versions, so your mileage may vary (Sequoia beta is a definite must have).
Part 1 - Getting the model and command line LLM working
The LLM model we'll be using (Mistral 7B v0.3) lives at Hugging Face with a bazilion other models. Go create a Hugging Face account and log in (free but requires verifying your email) - https://huggingface.co
Turn on your access to the Mistral7B LLM model by clicking the ‘Agree and access repository’ - https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3
Create a Hugging Face access token with read permissions to use later. Copy and save the token data that looks something like asdijfasidfhaisudhfaoiusdhfaiousdhf - https://huggingface.co/settings/tokens
If you have tried to get any of this working before, make sure you remove the directory Documents/huggingface on your Mac. Past attempts that weren't fully successful probably created and cached files there that will get in the way and block progress.
Now fire up Terminal and create the directory you’re going to be working in (for this example I’m using Downloads/apple-llm-test on my Mac).
cd Downloads
mkdir apple-llm-test
cd apple-llm-test
Install the Swift Transformers project that lets us run these CoreML models using swift. Once it's installed, jump into that new directory. Note that the preview branch of Swift Transformers is required.
git clone -b preview https://github.com/huggingface/swift-transformers
cd swift-transformers/Examples
??????Critical Fail Step - Don't Skip
Set up the Hugging Face environment variable in the terminal window so the the software will have permissions to access some of the key installation and configuration elements. Without this, nothing is going to work properly. Note that we're just setting it up for this one window. If you want to do a bunch of future command line work you'll need to add it to your favorite shell setup. And make sure to replace 'YOURTOKENHERE' with the goop you got from the Hugging Face web site for your token.
export HUGGING_FACE_HUB_TOKEN=YOURTOKENHERE
Install the Hugging Face tools that you'll need to download the model.
pip install -U "huggingface_hub[cli]"
NOTE: If you’re using homebrew on your Mac (and only if you're using it - you’ll know if you are) here's an alternate install command for you (skip this if you're not sure):
领英推荐
brew install huggingface-cli
Now the most time-consuming step with the big download. Go get the models (we’ll just use the Int4 version of Mistral for a smaller size in this example). Remember that you are in the swift-transformers/Examples directory when you run this.
huggingface-cli download --local-dir Mistral7B --local-dir-use-symlinks False apple/mistral-coreml? --include "StatefulMistral7BInstructInt4.mlpackage/*"
And now if everything ran the way it was supposed to, we can talk to our model via the command line.
cd Mistral7B
swift run transformers "Best recommendations for a place to visit in Paris in August 2024:" --max-length 1024 StatefulMistral7BInstructInt4.mlpackage
And the output should look something like this.
Generating
Best recommendations for a place to visit in Paris in August 2024:
1. Palace of Versailles: This iconic palace is a must-visit. It's a short train ride from Paris and offers a glimpse into the opulence of the French monarchy.
2. Eiffel Tower: No trip to Paris is complete without a visit to the Eiffel Tower. You can take an elevator ride to the top for a stunning view of the city.
3. Louvre Museum: Home to thousands of works of art, including the Mona Lisa and the Winged Victory of Samothrace, the Louvre is a cultural treasure.
4. Notre-Dame Cathedral: This famous cathedral is currently undergoing restoration after a fire in 2019. By 2024, it should be open to the public again.
5. Seine River Cruise: A boat tour on the Seine River is a great way to see many of Paris's famous landmarks.
6. Montmartre: This artistic neighborhood is known for its bohemian vibe, the Sacré-C?ur Basilica, and the Moulin Rouge.
7. Musée d'Orsay: This museum houses a vast collection of Impressionist and Post-Impressionist masterpieces.
8. Arc de Triomphe: This iconic arch offers a panoramic view of Paris from its rooftop.
9. Sainte-Chapelle: This 13th-century Gothic chapel is known for its stunning stained glass.
10. Palais Garnier: This opulent opera house is a must-see for its architecture alone.
42.77 tokens/s, prompt pre-filling time: 0.73s, total time: 9.15s
One item to note here is the output performance measure (in this case, 42.77 tokens/second) while running on a 2023 MacBook Pro M2 Max. This number will become very important as we look at tuning models and code in the future, especially as we shift things to mobile platforms.
Next Steps
If you're getting command-line output you've made a huge step forward to getting things working on your Mac and beyond. Have fun poking at the LLM using the command line.
Sources
Feel free to dive into the source material if you really want to get under the hood.
Startup Hunter (AI-ML & Dev Tools)
1 个月Great article for the core solution. On the other hand, I wrote an article about some easy tools like, Exo, Ollama, LM Studio etc. They make it more and more easier nowadays. Especially Exo is my favorite. https://mehmetakar.dev/best-ways-to-run-llm-locally-on-mac/
Enterprise Marketing Manager - Worldwide Product Marketing
5 个月I just run LM Studio but you would do it the hard way! ??
Co-Founder & CEO at Lundi | Building a Borderless Global Workplace?? | Bestselling Author of Winning the Global Talent War
6 个月Let’s just say this is a “significant upgrade” for Swift developers. AI is about to make app development a lot more interesting!
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
6 个月The marriage of CoreML and LLMs on Swift presents a fascinating frontier in on-device AI. Swift's performance and CoreML's model integration capabilities create a potent synergy for real-time inference, but memory constraints remain a key challenge. Optimizing model size through techniques like quantization and pruning will be crucial for widespread adoption. The ability to run LLMs locally opens doors for privacy-preserving applications and reduced latency. You talked about running Mistral7B on Mac via the command line in your post. What strategies are you employing to manage memory usage when deploying a model of that size on a device with limited resources? If we imagine using this technique for real-time sentiment analysis within a voice assistant application, how would you technically leverage CoreML and Swift to process user speech and generate an immediate sentiment response while ensuring minimal latency?