Running an LLM on Your Mac - The Missing Guide - Part 1
Of course it's AI generated... tuning by Alex Bratton

Running an LLM on Your Mac - The Missing Guide - Part 1

Forward (and warning)

This article is for those geeky enough to want to run an LLM via Swift and CoreML on their Mac as the first step on the journey to writing Mac, iOS and visionOS apps that talk directly to LLM models to power a new generation of capabilities. If you're not looking to build apps around LLMs, this probably isn't for you.

Apple turned a lot of heads (including mine) by showing a CoreML LLM running directly on device at WWDC this year, and I needed to dive in to see where my team can take it. We're helping clients unleash the potential of LLMs on device to leverage existing hardware investments (why spend on the cloud if you have dedicated hardware in your team's hands), keep data more secure and work offline in challenging environments.

Getting Started

"There has to be a better way!"

That was my reaction to mashing together partial sets of instructions, code commit notes and READMEs to get an LLM running natively on the Mac. There are a lot of places where one mistake can derail the whole thing. Since I couldn't find a guide like this, I created one with the end to end process that will get you running with a local LLM in about 10 minutes.

To make this work, I’m currently running Mac OS Sequoia 15.1 beta and Xcode: 16.1 beta 2 (16B5014f). I was not able to get it working with older versions, so your mileage may vary (Sequoia beta is a definite must have).


Part 1 - Getting the model and command line LLM working

The LLM model we'll be using (Mistral 7B v0.3) lives at Hugging Face with a bazilion other models. Go create a Hugging Face account and log in (free but requires verifying your email) - https://huggingface.co

Turn on your access to the Mistral7B LLM model by clicking the ‘Agree and access repository’ - https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

Create a Hugging Face access token with read permissions to use later. Copy and save the token data that looks something like asdijfasidfhaisudhfaoiusdhfaiousdhf - https://huggingface.co/settings/tokens

If you have tried to get any of this working before, make sure you remove the directory Documents/huggingface on your Mac. Past attempts that weren't fully successful probably created and cached files there that will get in the way and block progress.

Now fire up Terminal and create the directory you’re going to be working in (for this example I’m using Downloads/apple-llm-test on my Mac).

cd Downloads 

mkdir apple-llm-test 

cd apple-llm-test        

Install the Swift Transformers project that lets us run these CoreML models using swift. Once it's installed, jump into that new directory. Note that the preview branch of Swift Transformers is required.

git clone -b preview https://github.com/huggingface/swift-transformers

cd swift-transformers/Examples        

??????Critical Fail Step - Don't Skip

Set up the Hugging Face environment variable in the terminal window so the the software will have permissions to access some of the key installation and configuration elements. Without this, nothing is going to work properly. Note that we're just setting it up for this one window. If you want to do a bunch of future command line work you'll need to add it to your favorite shell setup. And make sure to replace 'YOURTOKENHERE' with the goop you got from the Hugging Face web site for your token.

export HUGGING_FACE_HUB_TOKEN=YOURTOKENHERE        

Install the Hugging Face tools that you'll need to download the model.

pip install -U "huggingface_hub[cli]"         

NOTE: If you’re using homebrew on your Mac (and only if you're using it - you’ll know if you are) here's an alternate install command for you (skip this if you're not sure):

brew install huggingface-cli        

Now the most time-consuming step with the big download. Go get the models (we’ll just use the Int4 version of Mistral for a smaller size in this example). Remember that you are in the swift-transformers/Examples directory when you run this.

huggingface-cli download --local-dir Mistral7B --local-dir-use-symlinks False apple/mistral-coreml? --include "StatefulMistral7BInstructInt4.mlpackage/*"        


And now if everything ran the way it was supposed to, we can talk to our model via the command line.

cd Mistral7B

swift run transformers "Best recommendations for a place to visit in Paris in August 2024:" --max-length 1024 StatefulMistral7BInstructInt4.mlpackage        

And the output should look something like this.

Generating
Best recommendations for a place to visit in Paris in August 2024:

1. Palace of Versailles: This iconic palace is a must-visit. It's a short train ride from Paris and offers a glimpse into the opulence of the French monarchy.
2. Eiffel Tower: No trip to Paris is complete without a visit to the Eiffel Tower. You can take an elevator ride to the top for a stunning view of the city.
3. Louvre Museum: Home to thousands of works of art, including the Mona Lisa and the Winged Victory of Samothrace, the Louvre is a cultural treasure.
4. Notre-Dame Cathedral: This famous cathedral is currently undergoing restoration after a fire in 2019. By 2024, it should be open to the public again.
5. Seine River Cruise: A boat tour on the Seine River is a great way to see many of Paris's famous landmarks.
6. Montmartre: This artistic neighborhood is known for its bohemian vibe, the Sacré-C?ur Basilica, and the Moulin Rouge.
7. Musée d'Orsay: This museum houses a vast collection of Impressionist and Post-Impressionist masterpieces.
8. Arc de Triomphe: This iconic arch offers a panoramic view of Paris from its rooftop.
9. Sainte-Chapelle: This 13th-century Gothic chapel is known for its stunning stained glass.
10. Palais Garnier: This opulent opera house is a must-see for its architecture alone.

42.77 tokens/s, prompt pre-filling time: 0.73s, total time: 9.15s        

One item to note here is the output performance measure (in this case, 42.77 tokens/second) while running on a 2023 MacBook Pro M2 Max. This number will become very important as we look at tuning models and code in the future, especially as we shift things to mobile platforms.


Next Steps

If you're getting command-line output you've made a huge step forward to getting things working on your Mac and beyond. Have fun poking at the LLM using the command line.

In part 2 you'll see how to use the Swift Chat app and Xcode to run the model in a GUI on the Mac.


Sources

Feel free to dive into the source material if you really want to get under the hood.

https://huggingface.co/blog/mistral-coreml

https://huggingface.co/apple/mistral-coreml

https://huggingface.co/blog/swift-coreml-llm

https://github.com/huggingface/swift-transformers

https://github.com/huggingface/swift-transformers/tree/preview


Mehmet Akar

Startup Hunter (AI-ML & Dev Tools)

1 个月

Great article for the core solution. On the other hand, I wrote an article about some easy tools like, Exo, Ollama, LM Studio etc. They make it more and more easier nowadays. Especially Exo is my favorite. https://mehmetakar.dev/best-ways-to-run-llm-locally-on-mac/

回复
Todd Dailey

Enterprise Marketing Manager - Worldwide Product Marketing

5 个月

I just run LM Studio but you would do it the hard way! ??

Jonathan Romley ????

Co-Founder & CEO at Lundi | Building a Borderless Global Workplace?? | Bestselling Author of Winning the Global Talent War

6 个月

Let’s just say this is a “significant upgrade” for Swift developers. AI is about to make app development a lot more interesting!

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

6 个月

The marriage of CoreML and LLMs on Swift presents a fascinating frontier in on-device AI. Swift's performance and CoreML's model integration capabilities create a potent synergy for real-time inference, but memory constraints remain a key challenge. Optimizing model size through techniques like quantization and pruning will be crucial for widespread adoption. The ability to run LLMs locally opens doors for privacy-preserving applications and reduced latency. You talked about running Mistral7B on Mac via the command line in your post. What strategies are you employing to manage memory usage when deploying a model of that size on a device with limited resources? If we imagine using this technique for real-time sentiment analysis within a voice assistant application, how would you technically leverage CoreML and Swift to process user speech and generate an immediate sentiment response while ensuring minimal latency?

要查看或添加评论,请登录

Alex Bratton ?的更多文章

  • Running an LLM on Your Mac - The Missing Guide - Part 2

    Running an LLM on Your Mac - The Missing Guide - Part 2

    In case you missed part 1, make sure you start there to get the LLM running via command line first. That includes…

    2 条评论
  • The Secret of Online Table Seating

    The Secret of Online Table Seating

    The Secret of Small Tables Why all-hands side chatter is really culture-building gold Imagine you’re leading your…

  • What does 'most human experience' mean?

    What does 'most human experience' mean?

    "It was the most human virtual event I've experienced!" That was the start of Salveo Partners CEO Rosie Ward, Ph.D.

    1 条评论
  • Escaping Chat Hell - the 3 chat rule

    Escaping Chat Hell - the 3 chat rule

    TLDR: Best Practice Tip: My third text in any conversation is always, "Let's talk live" Yes, that's right, my THIRD…

    2 条评论
  • Top 10 Work From Anywhere Pains

    Top 10 Work From Anywhere Pains

    I'm constantly hearing about work from anywhere challenges from leaders and their teams. Here's a collection of 10 big…

    6 条评论
  • How the Drone Industry Parallels the Mobile Apps World

    How the Drone Industry Parallels the Mobile Apps World

    Attending the InterDrone conference this week really brought home for me some similarities between the drone market and…

    6 条评论
  • 3 Keys to Digital Transformation Success

    3 Keys to Digital Transformation Success

    I’m just returning from a Digital Transformation (DT) conference that brought together a wide range of digital leaders…

  • Hand Written Notes on iPad

    Hand Written Notes on iPad

    I am very frequently asked about taking hand written notes on an iPad. This is almost always followed by the question…

    6 条评论
  • Mobile Exec Gear 101

    Mobile Exec Gear 101

    To get started on your fully mobile journey, you’ll need to make sure you’re properly equipped. Here’s a look at the…

    5 条评论
  • Mobile Executive Guide Launching

    Mobile Executive Guide Launching

    My passion is helping people apply technology to thrive in their work setting. As the CEO & Chief Geek of Lextech, my…

    3 条评论

社区洞察

其他会员也浏览了