登录查看更多内容

Setting up Your AI/ML Home Lab: Newbies Challenges and Solutions

Amr Salem

Distinguished Engineer @ Verizon | GenAI Solutions, Solutions Architecture

发布日期: 2023年8月24日

Earlier this year, We were having thoughts about getting a new machine for our home studio that doesn't produce any noise like the Lab HPE Prolaint Server we currently own, and somehow we ended up getting a new M2 Mac Studio which we thought back then with all the new Apple advancements with M2 Chip and maxing it out, this machine will be our new single stop shop machine for everything coding, video production and even our AI/ML learnings.

However, learning the hard way we came to know that long story short you need an Nvidia GPU to make some real progress and benefit from the community-wide support around Nvidia hardware.

But long story long we wanted to share our journey here with you in case you're getting started in the domain like ourselves and save you some of the hassle of the confusion and the underlying choices to be made with the reasoning behind it.

Why or Why Not M1/M2 Chip for AI/ML self-learning and explorations?

Though Apple recently announced its Metal Performance Shaders, or MPS as a framework for GPU primitives and various functionality processing like linear algebra, ray tracing, and other machine learning featured APIs that AI/ML libraries like PyTorch and TensorFlow can benefit from, there are however a few caveats to that promise from a market-wide usability perspective as of today ( which may or may not change in near future )

PyTorch has limited support for MPS to only Mac OS Host machines ( As of version Stable (2.0.1)

This means if you're in a situation where you have the need to run your docker containers with AI software that is utilizing PyTorch like HayStack you simply can't do so!

No alt text provided for this image — PyTorch Supported configurations for MPS per Pytorch.org prerequisites tool

CUDA Compute Platform is the way to go but .... You need to know this!

From above we came to learn about CUDA which is the Nvidia toolkit that provides you with the development environment and runtimes you need for implementing your GPU/Parallel Processing which is what you need for your AI models.

At that point obviously, we decided to get an Nvidia GPU.

From the PyTorch.org prerequisites tool, we learned that we would need CUDA 11.8 or at least 11.7 to get to run it on a Linux host which adds one more complexity factor to the equation which is whether would work with any Nvidia GPU?

Or there is some sort of a CUDA/GPU compatibility aspect that is out there?

And that turned true in a nutshell your CUDA versions have to be compatible with your GPU Compute Capability version.

After a bit of research and being on the hunt for an Nvidia GPU that supports the latest and greatest CUDA versions that are out there, we decided on the GeForce RTX 3060Ti 8G and Yes that is better than the RTX 3060 12G simply because it has 4,864 CUDA cores over 3,584 CUDA contained in its non Ti competitor so more RAM is not always the only factor especially with about +28% boost in performance in our case according to gpu.userbenchmark.com

Interesting Thought!

Can you run an Nvidia GPU on an M1/M2 Mac host?

We decided to go the extra mile to come up with how we could run Nvidia GPU on a Mac OS and hit two birds with one stone! but then we hit a snag!

First of all, how would you get it connected? Mac studio is a little box that the GPU is most likely wider than its width, Looking around for a solution for that we found an eGpu ( External GPU) enclosure which are external case where you can fit your GPU inside and secure its power supply with the ability to connect it to your PC through USB C/Thuderbolt kind of connection!

领英推荐

AI Hardware: CPU vs GPU vs NPU

Alex Wang 2 个月前

?? How to Get Lightning-Fast LLMs

AlphaSignal 11 个月前

AI Accelerators- The importance of the right…

Marcello B. 6 个月前

Miraculous right? Not quite! yet there is another challenge, which is the Nvidia GPU driver for Mac M1/M2 Chip, we could find a driver out there for Mac Intell Chip because that has been supported forever anyway, however for Mac M1/M2 Chip, Apple seems to have dropped its support and not made any mention of planning to support 3rd parity GPUs in future, so back to square zero.

We decided to cut the chase on Mac M1/M2 and go back to AMD/Intel PCs

AMD CPU was a better bang for the buck

We put together an AM4 / AMD Ryzedn 5 5600X 6-Core 12-Thread CPU on an MSI motherboard with a 128G RAM DDR 4 and thought we were ready for the show time as we figured out multiple pieces of the puzzle! but Nah a couple of more issues blocked our way!

One which was really quick is that the Motherboard gave us a black screen After a bit of research we figured it was due to the GPU being newer than the motherboard though we still have physical AM4 support on the motherboard that needed to be upgraded and we did exactly that with the help of the motherboard manual, the onboard CMOS button for flashing the motherboard and an MBR / FAT 32 formatted USB Drive (DON'T USE GPT FORMATTED PARTITIONS)

GPU Passthrough

The second problem faced is we learned that in order for us to share the Nvidia GPU with our virtual machine we need to configure what is called GPU Pass-through which is a technology implemented in most HyperVisors to allow the Linux kernel to directly present the internal PCI GPU to a virtual machine. and as we are not in a data center kind of setup NVIDIA doesn't really allow sharing the GPU to multiple hosts, that is only allowed for high-end GPUs like Tesla for the enterprise data mining setups.

This means at any given point of time a single GPU can only be attached to a single host whether it is a physical or virtual host! and we could trick the Linux OS into thinking it's a CPU host rather than a virtual machine so it can recognize the GPU card but that was not the real challenge!

The real challenge was as you must have guessed by now that with that constraint we needed another GPU, one for our physical box display and the other for our VMs and Docker containers running on top which took us back to the built-in GPU that came along with our motherboard but one second!

We have got AMD Ryzen 5 5600X and in order to turn the built-in motherboard integrated GPU on we must get AMD Ryzen 5 5600G as that is the one that does the magic and has the GPU chip that enables the motherboard GPU and we have done exactly so!

Now we're ready

The next steps are all about deciding on the Virtualization OS as well as deciding which VMs we need and getting the GPU passthrough working from the Physical host to the VM to the Docker Engine on top of that! Which will introduce us to the Nvidia Linux GPU drivers, CUDA Toolkit, and Nvidia Container Toolkit next.

We'll discuss all that along with the tips and tricks to have an AI/ML multipurpose Home Lab Server up and running in our next post.

If you've enjoyed this article please consider sharing it with others!

Acknowledgment

The Author would like to thank the support and contribution provided by

Ayman H. ( Homains CTO) part of this exercise journey.

About the Author

Amr Salem was born in Cairo, Egypt. He is a technology geek. He currently holds the position of Principal Engineer-System Architecture with Verizon, Temple Terrace, FL, USA. He plays a pivotal role in providing innovative solutions across Verizon’s Network Systems. Before joining Verizon, he was with the IBM Clients Innovation Center, where he honed his skills and expertise in the technology field. His diverse talents and dedication make him a valuable asset in the technology industry and a source of inspiration for aspiring writers.

Bheem Raj

1 年

Wow Amr Salem. Trying AI/ML home lab Server for Virtualization enabled environment looks super cool ?? choice. Excited and will be awaiting to see tips and tricks in next article!

1 次回应

SENGODAN SUBRAMANIAN (SUBRA)

1 年

Wow! You’re definitely cut out to be a technocrat. What detailed research and analysis! I’d love to see the final configuration in your next post. Keep your passion fueled.

2 次回应

Mohamed Shahat

Software Strategist | Sales Engineering

1 年

Thanks very much for the post, Amr. I have gone through a home lab exercise recently and yes it takes several steps and a ton of investigation with every step. I am an Apple macs fan, but it’s proving day over day that they are no more machines for beyond the straightforward job

2 次回应

Ayman H.

…

1 年

???? ???? ? ??? ??? ????? ????? ???????

2 次回应

查看更多评论

要查看或添加评论，请登录

Amr Salem的更多文章

GenAI Integration Building Blocks - How the Puzzle Works ?

2024年9月17日

GenAI Integration Building Blocks - How the Puzzle Works ?

The power of Generative AI (GenAI) lies not just in its ability to generate human-like text, but also in its seamless…

8 条评论
Unlock GenAI: Your 2-Step Guide to Getting Started

2024年9月3日

Unlock GenAI: Your 2-Step Guide to Getting Started

I constantly get asked, "How do I even start learning about GenAI?" Well, here's your roadmap – a two-step guide to get…

3 条评论
Paper: Evaluating Solutions for Achieving High Availability or Near Zero Downtime

2024年6月28日

Paper: Evaluating Solutions for Achieving High Availability or Near Zero Downtime

Publisher: IEEE Authors: Antra Malhotra; Amr Elsayed (Salem); Randolph Torres; Srinivas Venkatraman Link:…

2 条评论
The Blurred Lines: Is Science Fiction Still Fiction in the Age of AI?

2024年5月30日

The Blurred Lines: Is Science Fiction Still Fiction in the Age of AI?

Science fiction has long been a reflection of humanity's hopes, fears, and aspirations. From Jules Verne's…

3 条评论
When Competition Becomes Conflict: The Pitfalls of Team Rivalries

2024年4月22日

When Competition Becomes Conflict: The Pitfalls of Team Rivalries

This time wanted to share a different perspective on Competition. However Competition is an essential part of…

4 条评论
Embedded my data to GPT in (5) Simple Steps using LLAMA Index

2023年12月24日

Embedded my data to GPT in (5) Simple Steps using LLAMA Index

0) Pre-requisites Basic Python understanding, LlamaIndex lib optionally dotenv lib for setting env. variables from .

1 条评论
This is an issue, That's a Cause!

2023年12月13日

This is an issue, That's a Cause!

You can destroy those who speak the truth, but you cannot destroy the truth itself. — Terry Goodkind Lyrics Credits By…
BAU versus non BAU

2017年4月19日

BAU versus non BAU

The meaning of BAU. According to Wikipedia “Business as usual (BAU) – the normal execution of standard functional…
Simple IT Architecture

2016年3月1日

Simple IT Architecture

There is always a big challenge for IT Solutions to propose a simple architecture that can solve a certain business…

See all articles

Setting up Your AI/ML Home Lab: Newbies Challenges and Solutions

Amr Salem

Distinguished Engineer @ Verizon | GenAI Solutions, Solutions Architecture

Why or Why Not M1/M2 Chip for AI/ML self-learning and explorations?

PyTorch has limited support for MPS to only Mac OS Host machines ( As of version Stable (2.0.1)

CUDA Compute Platform is the way to go but .... You need to know this!

Interesting Thought!

Can you run an Nvidia GPU on an M1/M2 Mac host?

领英推荐

AMD CPU was a better bang for the buck

GPU Passthrough

Now we're ready

Acknowledgment

About the Author

Amr Salem的更多文章

社区洞察

其他会员也浏览了

How does the architecture of Nvidia GPUs, particularly their Tensor cores, facilitate advancements in AI and machine learning?

Building the Future of MLOps with GPUs: Speed, Scalability and Efficiency

CPU, GPU, TPU, NPU: A Breakdown of Processing Units in the AI Era

Nvidia: A Moat In AI GPU Technology

LLM Inference: Hardware Solutions Under the Spotlight, including Nvidia, Intel, and the Rise of AMD

Running ML inference with AMD GPU and ROCm (Part II)

Breaking Barriers: Magic Dev's 100M tokens Long-Term Memory Model

Accelerated Computing with C++

Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js

Why or Why Not M1/M2 Chip for AI/ML self-learning and explorations?

PyTorch has limited support for MPS to only Mac OS Host machines ( As of version Stable (2.0.1)

CUDA Compute Platform is the way to go but .... You need to know this!

Interesting Thought!

Can you run an Nvidia GPU on an M1/M2 Mac host?

领英推荐

AMD CPU was a better bang for the buck

GPU Passthrough

Now we're ready

Acknowledgment

About the Author

Amr Salem的更多文章

GenAI Integration Building Blocks - How the Puzzle Works ?

Unlock GenAI: Your 2-Step Guide to Getting Started

Paper: Evaluating Solutions for Achieving High Availability or Near Zero Downtime

The Blurred Lines: Is Science Fiction Still Fiction in the Age of AI?

When Competition Becomes Conflict: The Pitfalls of Team Rivalries

Embedded my data to GPT in (5) Simple Steps using LLAMA Index

This is an issue, That's a Cause!

BAU versus non BAU

Simple IT Architecture

社区洞察

其他会员也浏览了

How does the architecture of Nvidia GPUs, particularly their Tensor cores, facilitate advancements in AI and machine learning?

Building the Future of MLOps with GPUs: Speed, Scalability and Efficiency

CPU, GPU, TPU, NPU: A Breakdown of Processing Units in the AI Era

Nvidia: A Moat In AI GPU Technology

LLM Inference: Hardware Solutions Under the Spotlight, including Nvidia, Intel, and the Rise of AMD

Running ML inference with AMD GPU and ROCm (Part II)

Breaking Barriers: Magic Dev's 100M tokens Long-Term Memory Model

Accelerated Computing with C++

Geek out time: try LLM and Embeddings on Nvidia NIM with Node.js