Think Again Before Splurging on Top-end GPUs
We are in the midst of AI mania, and it comes with a close relative: GPU mania. As AI makes its mark on varying industries, enterprise investment in the infrastructure needed to support AI workloads means that we are experiencing a GPU land grab. Many cloud service providers and enterprises are buying up GPUs in anticipation of future AI applications. It’s an “if we build it, they will come” strategy.
But GPUs are not the be-all end-all, and there’s no one-size-fits-all solution. In fact, as the AI industry shifts from building and pre-training foundation models to using those models for inference and generation,? we can expect that today’s GPU-heavy compute will shift to a continuum – cutting-edge GPUs for the largest “ask me anything” LLMs, to mid-level GPUs for mid-size task-specific LLMs, and to software-optimized CPUs for small low-latency edge LLMs.
It’s clear that many organizations are taking their time to sort through the hype – and rightfully so. There is uneven AI adoption right now across industries, with large companies and industries like manufacturing leading the charge. And overall, according to a February 2024 survey from the U.S. Census Bureau, only 5% of U.S.-based companies were currently using AI, and 7% planned to use it within the next 6 months.
At Akamai, we’ve been considering the question of how to implement GPUs – and especially the question of what GPUs are needed for AI inference at the edge – closely. The answers aren’t necessarily obvious. Here’s why:
1. GenAI is just the tip of the iceberg
When it comes to AI applications and models, think of an iceberg. Above the surface, we have those GenAI models that we’ve all heard of, the ones that garner all of the attention and headlines. But below the surface, there’s so much more. These are the myriad of AI models that you’ve never heard of. These are the deep learning models that do things like image classification, anomaly detection, and clustering, and that have been delivering real value to enterprises for a decade or more.?
When you realize how AI has been in use by enterprises well before the current GenAI and GPU mania, it becomes clear that there is significant hype that needs to be sorted through.
2. You don’t need the best GPUs, you need the right GPUs
GPUs for AI are evolving quickly. For example, if you had bought NVIDIA’s top-end GPUs for AI just a couple of years ago, you’d already be a factor of 32 below the stated performance of the newest generation.?
Further, GPUs are also not “one size fits all”. They span a wide range of performance capabilities and price points. Before purchasing GPUs, it’s important to know what your use case is. Are you trying to build your own LLM? Deploy machine learning models at the edge? Leverage deep learning to help address cybersecurity risk? Customize an existing LLM to support your customer experience??
领英推荐
A GPU that is well suited for a given AI model may be a horrible fit for another AI model. By simply recognizing that the GPUs bought today might not be a good match for the use cases that emerge in the future, you are setting yourself up for success.
3. AI may not always need GPUs
Finally, it’s important to remember that AI algorithms are also progressing at a breakneck pace. And they aren’t always getting bigger and more complex. A particularly interesting example is the 1-bit transformer from Microsoft. This algorithm involves representing certain values in the model with just one bit – one bit to represent the values 1 and -1 – so matrix multiplication (the core task GPUs do so well) becomes simple addition and subtraction. This Microsoft algorithm is an extreme example of a technique called quantization, but it suggests that AI models that need GPUs today, may not need GPUs tomorrow.
4. The energy demands of GPUs can’t be ignored
The energy required to power GPUs is of significant concern. As enterprises scale up their AI workloads, the energy demand of GPU-heavy infrastructure grows exponentially. On average, GPUs use more than twice the amount of power per unit of processing than CPUs.?
An interesting point here is that when it comes to AI inference, there are software-led innovations that can help reduce energy demands. Generally, these work by reducing the size of the AI model through various techniques. Today, in the early days of GenAI, we see cases throwing megawatts at problems that can be solved with milliwatts.?
Okay, I might be exaggerating a bit here, but the point still stands. LLMs are being touted as the solution for problems that are more economically solved with small language models (SLMs), deep learning, machine learning, or even plain-old data analytics. If enterprises are willing to take their time and optimize their chosen AI model for their specific use case, they can limit energy demands.
The GPU conversation is complex. But developing a thoughtful strategy that avoids succumbing to the hype is essential for making sound hardware investments. By understanding the broader AI landscape, selecting the right GPUs for your specific needs, and considering the future evolution of AI algorithms, you can position your organization for long-term success.
Thank you for bringing attention to the nuances of GPU selection in the context of AI. Your point about tailoring technology to specific business needs is crucial, as the demands of AI applications can vary significantly across different industries. It would be interesting to hear more about your experiences in evaluating the right mix of GPUs and CPUs. What criteria do you prioritize when making these decisions? Looking forward to further insights from you and the community.
Transformational CIO | Global Leadership & Strategy | Data Advocate | Talent Amplifier
7 个月Spot on as usual, Robert Blumofe Cut through the hype and pragmatically identify real world solutions. Thanks for posting I remember back in the late 2000s I heard you speak at a Thornton May event. You shared that rather than buying the latest/greatest hw for your edge servers (as well as the most expensive) you simply bought dependable, fit-for-purpose servers in bulk w/ the expectation they'll fail & could quickly be replaced from local inventory. Rational & common-sense when shared, but counter-intuitive for most. I subsequently applied that approach in Sub-Saharan Africa and Asia to great advantage. Thanks again
CTO at Cellino
7 个月Well said, very few purpose-specific AI models have a trillion parameters, but may have very different throughput and cost requirements— take for example autonomous driving, or therapeutic cell and tissue manufacturing for that matter!
Head Of Partnerships @ Skillz Inc. | Strategic Partnerships | ex-Amazon | ex-Akamai
7 个月As we scale workflows like encoding or AI inference models, operationalizing costs becomes crucial. Here's how companies can challenge Nvidia's market position: Cost & Power Efficiency: ARM processors and specialized GPUs offer significant cost and power savings. Open-Source Advantage: Robust Linux software stacks provide flexibility and drive continuous innovation. Example: FFMPEG with ARM chips has dramatically reduced broadcasting costs, showcasing the potential for other industries.