This Next Generation GPU May Very Well Pave The Path to Artificial General Intelligence
The new Etched Sahu GPT-direct Processing Chip

This Next Generation GPU May Very Well Pave The Path to Artificial General Intelligence

AI frontier models face significant constraints, including data scarcity and technical limitations like the enormous power consumption of GPUs during training. These challenges threaten to bottleneck AI advancement and escalate costs, limiting progress to the most well-funded entities.

The introduction of the Etched Sohu chip offers a breakthrough. Unlike NVIDIA GPUs, which require 10 times the compute power of a simple neural network, the Sohu chip demands only half that power. This specialized chip focuses exclusively on transformer model execution, achieving higher performance and efficiency.

Sohu's architecture, optimized for matrix multiplication and boasting over 90% FLOPS utilization, drastically reduces computational and energy requirements. This innovation has the potential to democratize access to advanced AI and foster sustainable development, addressing critical constraints and paving the way for future breakthroughs.

Why Generative AI Continues to Get More Expensive.

What is a Transformer ?

It is the foundation of all the modern AI models. Invented by a team at Google led by Ashish Vaswani in 2017, its rapid adoption led to major advances in AI culminating in the release of ChatGPT 3.0 in late 2022.

"G P T" stands for Generative Pre-trained Transformer

Evolution

Over the history of computing, successive generations of computers and software have gotten more efficient AND declined in cost. We all expect each new generation of laptop or iPhone to have astounding new features but stay about in the same price range as the generation before.

AI is the first time this has started to work in reverse. New models are more expensive. Deploying AI in enterprise settings has huge potential benefits that are overshadowed by an uncertain Return on Investment (ROI).

The increase in required GPU power tracks the increased sophistication of neural networks which started as a lab curiosity and evolved in the powerful Transformers that run today's Frontier Model AI platforms:

  • ANNs: Artificial Neural Networks - Low compute power, suitable for basic tasks.
  • CNNs: Convoluatinal Neural Networks - Moderate compute power, excels in image processing.
  • Transformers: High compute power, leading performance in NLP and other complex tasks but with substantial energy and resource costs.

The Etched Sohu Chip Changes Everything

There are two good ways to think about the resources required/computing load each of these three approaches take. Comparing them tells the story of why AI continues to get more expensive and how the new Etched ship attacks that

First let's compare the compute power requirements of the three types of models relative to each other.

ANN : CNN : GPT = 1 : 3 : 10

So to being with this simple comparison show that Transformers require 10 times the computation requirements that a simple Neural Net requires. Not surprising because the fundamental innovation in transformers is their ability to process serial streams of input and remember the sequential progression. Simple neural nets can just deal with one input at a time which is not conducive to LLM's.

Now, let's add a distinction to the Transformer computing requirements by delineating the computing requirements of a Transformer running on an NVIDIA chipset to a Transformer running on the Etched Sohu chip:

ANN : CNN. : GPT on Nvidia : GPT on Sohu = 1 : 3 : 10 : 1/2

So a GPT running on Nvidia requires 10x the compute power of a simple ANN. But when the GPT runs on the Etched Sohu, it requires 1/2 the computer power of a simple ANN. Most importantly, the same GPT run on an NVIDIA GPU cluster requires 20X LESS compute power when running on the Sohu.

For some additional background on this analysis including sources on the link below:

https://www.perplexity.ai/page/the-evolution-of-neural-nets-t-biqkDOfQS72AI1jtFy7l7g

Voodoo or Magic ? Actually neither.

Actually no voodoo or magic is at work here. By sacrificing the flexibility of general-purpose computing and focusing entirely on transformer model execution, Sohu can achieve significantly higher performance and efficiency for these specific workloads. The very innovative design features of Sohu include:

  1. Specialized Architecture: Sohu is an Application-Specific Integrated Circuit (ASIC) designed exclusively for transformer models, unlike NVIDIA's general-purpose GPUs. This specialization allows Sohu to optimize its hardware specifically for transformer computations.
  2. Streamlined Design: By focusing solely on transformer workloads, Sohu eliminates unnecessary control flow logic and components that general-purpose GPUs require. This streamlining allows more transistors to be dedicated to AI compute operations.
  3. Optimized for Matrix Multiplication: Transformer models heavily rely on matrix multiplication. Sohu's architecture is tailored to excel at these operations, which are the core of transformer processing.
  4. Higher FLOPS Utilization: Due to its specialized design, Sohu achieves over 90% FLOPS (Floating Point Operations Per Second) utilization, compared to approximately 30% for GPUs running transformer workloads. This means Sohu can perform more useful computations in a given time frame.
  5. Efficient Memory Access: Sohu's design likely optimizes memory access patterns specific to transformer models, reducing latency and improving overall performance.
  6. Parallel Processing Optimization: The chip's architecture is presumably optimized for the parallel processing requirements of transformer models, allowing for more efficient execution of these workloads.

With an Interview With Etched CEO Gavin Uberti

Here is a six minute interview on Bloomberg TV with the CEO Co-founder explaining the Sohu chip and their business model:


A Closer Look at the History of Neural Networks

The Evolution of Neural Nets

The neural net is the basic foundation of generative AI technology. It is a basic capability that finds its roots back in the '40s and was a point of interest by researchers in the '70's and '80's. Since then the neural net has evolved to become the empiric enabler to scaled AI models:

Artificial Neural Networks (ANNs)

Artificial Neural Networks (ANNs) are the simplest form of neural networks, consisting of layers of interconnected nodes or neurons. The compute power required for ANNs is relatively low compared to more advanced architectures. The computational cost is primarily determined by the number of neurons and connections, which translates to the number of floating-point operations (FLOPs) needed for training and inference.Impact on Performance:ANNs are suitable for basic tasks and small datasets but struggle with complex problems and large-scale data due to their limited capacity and computational efficiency.

Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are specialized for processing grid-like data, such as images. They use convolutional layers to automatically detect features, significantly reducing the number of parameters compared to fully connected layers. CNNs require more compute power than ANNs due to the additional operations involved in convolutions and pooling.Impact on Performance:CNNs excel in image-related tasks, offering high accuracy and efficiency. Their computational cost is higher than ANNs but remains manageable, especially with optimizations like reduced spatial dimensions and parameter sharing.

Transformers

Transformers, particularly known for their use in natural language processing (NLP), rely on self-attention mechanisms to process sequences in parallel. This architecture requires substantial compute power, especially for large models with millions or billions of parameters. The computational complexity of transformers is higher due to the self-attention mechanism, which involves operations proportional to the square of the sequence length and the dimension of the vector representations.Impact on Performance:Transformers have set new benchmarks in various tasks, including NLP and computer vision, but at the cost of significantly higher computational requirements. Training large transformer models can consume vast amounts of energy and computational resources, making them expensive to deploy and maintain.







#AIRevolution #DataChallenges #GPUEfficiency #AIEvolution #TransformersAI #TechInnovation #AIInfrastructure #SustainableAI #AIComputing #FutureOfAI






Bruce Eckfeldt

Coaching CEOs to Scale & Exit Faster with Less Drama

3 个月

Gary Ambrosino, this is a significant development in AI hardware. The potential for the Etched Sohu chip to surpass NVIDIA GPUs in efficiency and power consumption could indeed democratize AI access. Curious to see how this will impact AI-driven business strategies and innovation in the coming years.

回复

要查看或添加评论,请登录

社区洞察