登录查看更多内容

Running an LLM on Your Mac - The Missing Guide - Part 1

Alex Bratton ?

CEO & Chief Geek helping teams adapt to the new world of work

发布日期: 2024年9月18日

Forward (and warning)

This article is for those geeky enough to want to run an LLM via Swift and CoreML on their Mac as the first step on the journey to writing Mac, iOS and visionOS apps that talk directly to LLM models to power a new generation of capabilities. If you're not looking to build apps around LLMs, this probably isn't for you.

Apple turned a lot of heads (including mine) by showing a CoreML LLM running directly on device at WWDC this year, and I needed to dive in to see where my team can take it. We're helping clients unleash the potential of LLMs on device to leverage existing hardware investments (why spend on the cloud if you have dedicated hardware in your team's hands), keep data more secure and work offline in challenging environments.

Getting Started

"There has to be a better way!"

That was my reaction to mashing together partial sets of instructions, code commit notes and READMEs to get an LLM running natively on the Mac. There are a lot of places where one mistake can derail the whole thing. Since I couldn't find a guide like this, I created one with the end to end process that will get you running with a local LLM in about 10 minutes.

To make this work, I’m currently running Mac OS Sequoia 15.1 beta and Xcode: 16.1 beta 2 (16B5014f). I was not able to get it working with older versions, so your mileage may vary (Sequoia beta is a definite must have).

Part 1 - Getting the model and command line LLM working

The LLM model we'll be using (Mistral 7B v0.3) lives at Hugging Face with a bazilion other models. Go create a Hugging Face account and log in (free but requires verifying your email) - https://huggingface.co

Turn on your access to the Mistral7B LLM model by clicking the ‘Agree and access repository’ - https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

Create a Hugging Face access token with read permissions to use later. Copy and save the token data that looks something like asdijfasidfhaisudhfaoiusdhfaiousdhf - https://huggingface.co/settings/tokens

If you have tried to get any of this working before, make sure you remove the directory Documents/huggingface on your Mac. Past attempts that weren't fully successful probably created and cached files there that will get in the way and block progress.

Now fire up Terminal and create the directory you’re going to be working in (for this example I’m using Downloads/apple-llm-test on my Mac).

cd Downloads 

mkdir apple-llm-test 

cd apple-llm-test

Install the Swift Transformers project that lets us run these CoreML models using swift. Once it's installed, jump into that new directory. Note that the preview branch of Swift Transformers is required.

git clone -b preview https://github.com/huggingface/swift-transformers

cd swift-transformers/Examples

??????Critical Fail Step - Don't Skip

Set up the Hugging Face environment variable in the terminal window so the the software will have permissions to access some of the key installation and configuration elements. Without this, nothing is going to work properly. Note that we're just setting it up for this one window. If you want to do a bunch of future command line work you'll need to add it to your favorite shell setup. And make sure to replace 'YOURTOKENHERE' with the goop you got from the Hugging Face web site for your token.

export HUGGING_FACE_HUB_TOKEN=YOURTOKENHERE

Install the Hugging Face tools that you'll need to download the model.

pip install -U "huggingface_hub[cli]"

NOTE: If you’re using homebrew on your Mac (and only if you're using it - you’ll know if you are) here's an alternate install command for you (skip this if you're not sure):

领英推荐

How to Downgrade iOS 18 to iOS 17 Without Computer

saba shafiq 8 个月前

Sat 09/23/23, TeQ Tip. How to Share an…

Robert Black (Blacky) 1 年前

Android, 13 years later !

Reghu Ram 4 年前

brew install huggingface-cli

Now the most time-consuming step with the big download. Go get the models (we’ll just use the Int4 version of Mistral for a smaller size in this example). Remember that you are in the swift-transformers/Examples directory when you run this.

huggingface-cli download --local-dir Mistral7B --local-dir-use-symlinks False apple/mistral-coreml? --include "StatefulMistral7BInstructInt4.mlpackage/*"

And now if everything ran the way it was supposed to, we can talk to our model via the command line.

cd Mistral7B

swift run transformers "Best recommendations for a place to visit in Paris in August 2024:" --max-length 1024 StatefulMistral7BInstructInt4.mlpackage

And the output should look something like this.

Generating
Best recommendations for a place to visit in Paris in August 2024:

1. Palace of Versailles: This iconic palace is a must-visit. It's a short train ride from Paris and offers a glimpse into the opulence of the French monarchy.
2. Eiffel Tower: No trip to Paris is complete without a visit to the Eiffel Tower. You can take an elevator ride to the top for a stunning view of the city.
3. Louvre Museum: Home to thousands of works of art, including the Mona Lisa and the Winged Victory of Samothrace, the Louvre is a cultural treasure.
4. Notre-Dame Cathedral: This famous cathedral is currently undergoing restoration after a fire in 2019. By 2024, it should be open to the public again.
5. Seine River Cruise: A boat tour on the Seine River is a great way to see many of Paris's famous landmarks.
6. Montmartre: This artistic neighborhood is known for its bohemian vibe, the Sacré-C?ur Basilica, and the Moulin Rouge.
7. Musée d'Orsay: This museum houses a vast collection of Impressionist and Post-Impressionist masterpieces.
8. Arc de Triomphe: This iconic arch offers a panoramic view of Paris from its rooftop.
9. Sainte-Chapelle: This 13th-century Gothic chapel is known for its stunning stained glass.
10. Palais Garnier: This opulent opera house is a must-see for its architecture alone.

42.77 tokens/s, prompt pre-filling time: 0.73s, total time: 9.15s

One item to note here is the output performance measure (in this case, 42.77 tokens/second) while running on a 2023 MacBook Pro M2 Max. This number will become very important as we look at tuning models and code in the future, especially as we shift things to mobile platforms.

Next Steps

If you're getting command-line output you've made a huge step forward to getting things working on your Mac and beyond. Have fun poking at the LLM using the command line.

In part 2 you'll see how to use the Swift Chat app and Xcode to run the model in a GUI on the Mac.

Sources

Feel free to dive into the source material if you really want to get under the hood.

https://huggingface.co/blog/mistral-coreml

https://huggingface.co/apple/mistral-coreml

https://huggingface.co/blog/swift-coreml-llm

https://github.com/huggingface/swift-transformers

https://github.com/huggingface/swift-transformers/tree/preview

Mehmet Akar

Startup Hunter (AI-ML & Dev Tools)

1 个月

Great article for the core solution. On the other hand, I wrote an article about some easy tools like, Exo, Ollama, LM Studio etc. They make it more and more easier nowadays. Especially Exo is my favorite. https://mehmetakar.dev/best-ways-to-run-llm-locally-on-mac/

Todd Dailey

Enterprise Marketing Manager - Worldwide Product Marketing

5 个月

I just run LM Studio but you would do it the hard way! ??

1 次回应

Jonathan Romley ????

Co-Founder & CEO at Lundi | Building a Borderless Global Workplace?? | Bestselling Author of Winning the Global Talent War

6 个月

Let’s just say this is a “significant upgrade” for Swift developers. AI is about to make app development a lot more interesting!

2 次回应

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

6 个月

The marriage of CoreML and LLMs on Swift presents a fascinating frontier in on-device AI. Swift's performance and CoreML's model integration capabilities create a potent synergy for real-time inference, but memory constraints remain a key challenge. Optimizing model size through techniques like quantization and pruning will be crucial for widespread adoption. The ability to run LLMs locally opens doors for privacy-preserving applications and reduced latency. You talked about running Mistral7B on Mac via the command line in your post. What strategies are you employing to manage memory usage when deploying a model of that size on a device with limited resources? If we imagine using this technique for real-time sentiment analysis within a voice assistant application, how would you technically leverage CoreML and Swift to process user speech and generate an immediate sentiment response while ensuring minimal latency?

1 次回应

查看更多评论

要查看或添加评论，请登录

Alex Bratton ?的更多文章

Running an LLM on Your Mac - The Missing Guide - Part 2

2024年9月23日

Running an LLM on Your Mac - The Missing Guide - Part 2

In case you missed part 1, make sure you start there to get the LLM running via command line first. That includes…

2 条评论
The Secret of Online Table Seating

2023年2月13日

The Secret of Online Table Seating

The Secret of Small Tables Why all-hands side chatter is really culture-building gold Imagine you’re leading your…
What does 'most human experience' mean?

2023年1月26日

What does 'most human experience' mean?

"It was the most human virtual event I've experienced!" That was the start of Salveo Partners CEO Rosie Ward, Ph.D.

1 条评论
Escaping Chat Hell - the 3 chat rule

2022年3月6日

Escaping Chat Hell - the 3 chat rule

TLDR: Best Practice Tip: My third text in any conversation is always, "Let's talk live" Yes, that's right, my THIRD…

2 条评论
Top 10 Work From Anywhere Pains

2022年2月1日

Top 10 Work From Anywhere Pains

I'm constantly hearing about work from anywhere challenges from leaders and their teams. Here's a collection of 10 big…

6 条评论
How the Drone Industry Parallels the Mobile Apps World

2019年9月5日

How the Drone Industry Parallels the Mobile Apps World

Attending the InterDrone conference this week really brought home for me some similarities between the drone market and…

6 条评论
3 Keys to Digital Transformation Success

2019年2月28日

3 Keys to Digital Transformation Success

I’m just returning from a Digital Transformation (DT) conference that brought together a wide range of digital leaders…
Hand Written Notes on iPad

2018年5月27日

Hand Written Notes on iPad

I am very frequently asked about taking hand written notes on an iPad. This is almost always followed by the question…

6 条评论
Mobile Exec Gear 101

2018年4月17日

Mobile Exec Gear 101

To get started on your fully mobile journey, you’ll need to make sure you’re properly equipped. Here’s a look at the…

5 条评论
Mobile Executive Guide Launching

2018年4月13日

Mobile Executive Guide Launching

My passion is helping people apply technology to thrive in their work setting. As the CEO & Chief Geek of Lextech, my…

3 条评论

See all articles

Running an LLM on Your Mac - The Missing Guide - Part 1

Alex Bratton ?

CEO & Chief Geek helping teams adapt to the new world of work

Forward (and warning)

Getting Started

Part 1 - Getting the model and command line LLM working

??????Critical Fail Step - Don't Skip

领英推荐

Next Steps

Sources

Alex Bratton ?的更多文章

社区洞察

其他会员也浏览了

Unraveling the Thread Conundrum: A Journey Through iOS Performance Optimisation - How much is too much?

How to fix current Xcode version compatibility with a beta version of MacOS

Basic Constrains in SWIFT Design

[Newest] How to Downgrade to Unsigned iOS in 3 Steps

How to keep your Mac Clean

Prevent Tweak Attachment

What is Combine & Why Do I Need It?

Wi-Fi and Speedtest in one Apple Shortcuts

Recursive Functions Explained with Swift

How the Range Operators Work in Swift

Forward (and warning)

Getting Started

Part 1 - Getting the model and command line LLM working

??????Critical Fail Step - Don't Skip

领英推荐

Next Steps

Sources

Alex Bratton ?的更多文章

Running an LLM on Your Mac - The Missing Guide - Part 2

The Secret of Online Table Seating

What does 'most human experience' mean?

Escaping Chat Hell - the 3 chat rule

Top 10 Work From Anywhere Pains

How the Drone Industry Parallels the Mobile Apps World

3 Keys to Digital Transformation Success

Hand Written Notes on iPad

Mobile Exec Gear 101

Mobile Executive Guide Launching

社区洞察

其他会员也浏览了

Unraveling the Thread Conundrum: A Journey Through iOS Performance Optimisation - How much is too much?

How to fix current Xcode version compatibility with a beta version of MacOS

Basic Constrains in SWIFT Design

[Newest] How to Downgrade to Unsigned iOS in 3 Steps

How to keep your Mac Clean

Prevent Tweak Attachment

What is Combine & Why Do I Need It?

Wi-Fi and Speedtest in one Apple Shortcuts

Recursive Functions Explained with Swift

How the Range Operators Work in Swift