VMware by Broadcom Presents Private AI with Intel at AI Field Day 4
Gina Rosenthal
Product Marketing Leader | AI Enthusiast | Founder & CEO at Digital Sunshine Solutions | Co-Host of Tech Aunties Podcast
We get to hear from VMware again this week! Since this is Intel day at AI Field Day 4, this time VMware is talking about AMX CPUs for LLMs.
AI without GPUs: Using Intel AMX CPUs on vSphere for LLMs
Earl Ruby, R&D Engineer, Broadcom (VCF) is taking us through this presentation. Ruby came to VMware via Bitfusion.
This is a discussion about using VMware Private AI on Intel hardware. Intel has lots of tools to use with the more than 1 million 4th gen Xeon chips that have already been deployed.
Intel AMX (advanced matrix extensions) is integrated into every core. They are made of 3D registers called tiles. As customers do hardware refreshes on older holst w sapphire or emerald rapids, they get the AMX capabilities for AI/ML(as well as performance improvements for traditional computing).
vSphere 8 is required since AMX is in hardware version 20. Look for the Intel AI Tools Selector web site if you're a developer.
Looking at an off-the-shelf demo on an Ice Lake system (about 4 years old), and loaded a container with about 7B parameters. We're watching it load up, but it took a pretty long time. That's why people started moving to GPUs.
Ruby then ran the same model on a Sapphire Rapids system, and it ran very quickly.
Then Ruby fine tuned a model: generic off the shelf model and trained with a finance model on a 4-node system. It took 3.5 hours to do this - without GPUs. The model is running on 1 node. Ruby showed three screens of guests accessing the chatbot he created to access the model.
Ruby says: "Use CPUs when you can, GPUs when you must". Even when GPUs are sitting idle, they are drawing power. Also, GPUs have lower latency. If you are ok with a little lag, CPU will be fine.
领英推荐
AI without GPUs: Using Intel AMX CPUs on vSphere with Tanzu Kubernetes
Demo with OpenVINO on vSphere8, it compresses the neural network. It is lossy, but It's using a jupiter notebook to find people in a video.
Next Ruby ran a test on Kubernetes using Tanzu. You still need to run the right versions of everything for vSphere to work with AMX. You also have to have all the right stuff on the Tanzu nodes, including the right TKRs (tanzu kubernetes releases). This means adding a new content library for Tanzu.
If you have a modern server with Xeon chips you probs have AMX right now. If you have the right workload, and vSphere 8, you can try running AI on those servers without the need for an accelerator card.
#vsphere #vmware #broadcom #tanzu #AMX #Intel