The Power of Abstraction in Software
Like all digital products, AI is becoming easier to use and more accessible as it matures. A few years ago, you had to build a Python application to interact with a large language model (LLM). Today, we chat with them.?
Computer scientists call this progression to more natural language “abstraction.” Why abstraction? The base of all computer languages is machine code, zeros and ones, on and off. Nothing is abstract in binary. Higher-level instructions add commands that a human can read—MOV, ADD, SUB, AND, OR—abstractions that represent the 1’s and 0’s.?
Generative AI, general intelligence, and innovations we can’t yet see may well deliver the ultimate abstraction layer. No code, no programming, just conversations with the machine.??
The pros and cons of higher abstraction?
The benefits of increasing abstraction are obvious. TensorFlow (first release: 2015) and PyTorch (first release: 2016) made deep learning neural networks accessible to anyone who knew Python. The transformer architecture, the basis for LLMs, was first proposed in 2017. Five very short years later, we have ChatGPT.?
As AI becomes more sophisticated and more abstract, it becomes easier to use and more accessible, sparking innovation at an incredible pace. For example, researchers at the Argonne National Laboratory are creating AuroraGPT, a science-specific generative AI model that is training on a giant mass of scientific information[1]. This 1-trillion parameter model is training on the Aurora supercomputer. When complete, it will put complex, synthetic answers at the fingertips of the global scientific community—answers that could take months, even years, of trial-and-error experimentation.??
This compression of time and improved outcomes in research and development applies to virtually every industry and organization, including computer hardware and software. With the help of AI, developers—including laypeople—will be able to create, test, and refine solutions in a fraction of the time it takes to hand code and beta test.?
The downside? Abstraction can create computing overhead because, as computing languages become easier to use, they tend to use hardware less efficiently. Native Python is pretty friendly to humans, but it creates inefficiencies and bottlenecks that slow it down.?
Python’s processing overhead pales in comparison to GPT-4, which is rumored to have ~1.7 trillion parameters. Guestimates vary wildly, but it’s safe to say that GPT-powered inference services like ChatGPT and Dall-E consume a small city’s worth of electricity every month[2]. Few but the very largest enterprises, national laboratories, and government agencies can afford utility bills like that. To deploy AI in the real world, for practical applications, we must find ways of balancing these amazing new capabilities with computing performance and power consumption.??
How software optimization can create efficiencies in Python-based AI tooling?
In the early stages of deep learning, we made efficiency gains with the primitives. Can you run matrix multiplications faster? Can you run convolutions faster? The next step was maximizing hardware utilization. How do we run on all the heterogeneous machines we have? How do we distribute a workload and parallelize it? How do you handle memory hierarchies???
These optimizations have delivered significant performance gains without adding hardware or increasing electricity consumption. Now that LLMs have grown into supercomputing workloads, we have to create ways to optimize them so they can scale on less expensive, less energy-intensive hardware.?
From my perspective, software holds the key. We can squeeze 10x-100x efficiencies out of the AI pipeline at every step, from making the tooling more efficient to replacing massive general-purpose models with smaller, refined models that target specific use cases.??
The software team at Intel is working across the entire spectrum so AI can scale everywhere, with a specific focus on the open-source AI tooling and models that make up the current Python-based AI software ecosystem. These software-driven AI performance boosts are practically free for the community. They require almost no code changes or developer time and no additional hardware costs.?
Intel software optimizations deliver 10x to 100x performance gains for AI workloads?
Intel? PyTorch optimizations running in combination with the Intel? Extension for PyTorch for GPU delivers 3.9x-7.1x acceleration on first token latency and 1.5x - 3.6x on next token latency on the same datatype (FP16). We used Intel? Data Center GPU Max Series GPUs for these benchmarks[3].?
In tests, Intel software optimizations for TensorFlow deliver up to 16x gain in image classification inference and a 10x gain in object detection[4]. Intel optimizations for PyTorch can deliver up to 53x gain in image classification and nearly 5x gain for a recommendation system[4].??
SciKit-learn software optimizations from Intel produce up to 100x to 200x performance increases on machine learning algorithms[4]. Graph analytics on graphs approaching 50 million vertices and 1.8 billion edges can run up to 166x faster—with only software optimization[4]. We achieved all these results using 3rd Gen Intel? Xeon? Scalable processors.?
This is just the beginning of the AI acceleration we can achieve solely through software. We believe that software optimization can deliver similar magnitudes of performance improvements on practically every architecture, including dedicated hardware AI accelerators.?
The predictive AI performance data above represents acceleration on Intel? Xeon? CPUs. We anticipate software acceleration will improve performance on other architectures including GPUs and dedicated hardware AI accelerators.??
For a deeper dive, see Software AI accelerators: AI performance boost for free on Venture Beat and our research paper Efficient LLM inference solution on Intel GPU on arxiv.org.??
The open-source path providing higher levels of abstraction for AI and LLM
In the current environment, the race to build bigger and bigger models grabs all the headlines. GPT-4 is rumored to have trillions of parameters; Google’s Gemini 1.5 Pro supports a context window of up to 1 million tokens.??
领英推荐
Training these general-purpose, multi-modal models consumes immense amounts of compute, power, and time. Deploying them as inference services is also an expensive undertaking. In the future, these mega-models will only get bigger, which means they can only scale at immense cost.??
Open-source communities, Hugging Face in particular, are running a different race—one that prioritizes task-specific accuracy and hardware-aware performance over scale and multi-modal capabilities.??
Three ways open source is leading the way to AI at scale?
1. Shared toolsets, resources, and knowledge?
Deep learning was born and shared as an open-source project, and today’s community is creating and sharing cutting-edge tools and APIs. You can access pre-trained foundation models, retrain, and refine them, and deploy to an inference endpoint—all with open-source tools and frameworks.???
2. Transparent, open foundation models?
There are over 500,000 open-source models on Hugging Face in a range of sizes like Llama2 from Meta, which comes in 7B, 13B, and 70B parameter sizes. There are more models on GitHub, including GPT-2 from OpenAI. Intel software engineers work with the Hugging Face community to develop open-source optimizations like the Optimum Intel library and the Intel? Extension for Transformers. All of these building blocks are yours to use and explore in granular detail.?
3. Industry-wide support?
Hyperscalers, developers like OpenAI, Meta, and Google, and hardware manufactures like NVIDIA and Intel all develop and upstream libraries, extensions, and models to the Hugging Face community. These contributions are helping to make AI solutions easier to build and more efficient to run.?
Is open source viable? Have a look at neural-chat-7b-v3-3?
Intel software engineers created some buzz when their generative chat model neural-chat-7b-v3-3 topped the Hugging Face leaderboard for 7b models in the finetuning category. Engineers fine-tuned the pretrained Mistral-7B-v0.1 model using Intel? Gaudi2 accelerators running on the Intel? Developer Cloud. Fine-tuning included training on a specific dataset and applying the direct preference optimization (DPO) algorithm to align results with human preferences.?
The retrained model—which could be used for natural language processing, generating text and code, or translating—posted impressive results for common sense language and understanding (HellaSwag benchmark) and bias (Winogrande benchmark), all using open-source models and tooling. For details, read Supervised Fine-Tuning and Direct Preference Optimization on Intel Gaudi2 on Medium.com.?
Can the AI explosion overcome the laws of thermodynamics??
The exponential growth of supercomputing-class AI workloads is not sustainable because electricity, real estate, and time are not infinite. However, as a veteran software engineer, I believe that human ingenuity and innovation do approach infinity. While trillion-parameter models aren’t likely to scale endlessly, software engineers are already finding ways to break them down into smaller models that are highly accurate at specific tasks and can run on hardware with power budgets the planet can sustain.?
The trend toward higher and higher abstraction will continue. Developers and laypeople are already using generative AI to write software. Prompt engineering is an up-and-coming field. However, every wonderful, easy-to-use increase in abstraction will require more, faster hardware and enterprising software engineers who know how to wring performance out of the deepest levels of code.??
My hunch is that the software engineers will clear the way to AI everywhere.?
For additional information about Intel AI software, please visit AI Development (intel.com)
[1] HPC Wire, Training of 1-Trillion Parameter Scientific AI Begins, November 13, 2023, accessed March 2024
[2] Numenta, AI is harming our planet: addressing AI’s staggering energy cost (2023 update), August 10, 2023, accessed February 2024
[3] Hui Wu, et. al, Efficient LLM inference solution on Intel GPU, arxiv.org, December 19, 2023, accessed March 2024
[4] Damannagari, Chandan, Li, Wei, Software AI accelerators: AI performance boost for free, Venture Beat, September 22, 2021, accessed March, 2024
[5] Intel, Supervised Fine-Tuning and Direct Preference Optimization on Intel Gaudi2, Medium, November 14, 2032, accessed February 2024
Thank you very much Wei Li. While abstractions can accelerate applications development, optimizations have a direct impact to AI TCO.
Overcoming the laws of thermodynamics? Now THAT is an interesting way to bring #AIEverywhere, Wei Li.?
Incredible read, Wei Li! It’s exciting to see how your insights mirror the groundbreaking work we’re doing here at Intel. Your vision for AI’s future is not just thought-provoking, it's what we're making a reality every day. Cheers to making AI both brainy and sustainable! ??