H100 GPU to Edge GenAI : Quantized & Finetuned LLMs!
$1Billion headquarters of NVIDIA @Santa Clara, California / Today's market cap $1.09 T.

H100 GPU to Edge GenAI : Quantized & Finetuned LLMs!

Which technology company entered the elite league of $1.0 T market cap companies very recently, only to be the sixth right now on the surface of this planet, though it took them exactly 30 years?

Its the largest GPU chip manufacturer today, NVIDIA! Yes the silent disruptor in accelerated computing right now, despite not being the supplier of core microprocessor chips on all the computing device around us, INTEL and AMD dominate that space today. NVIDIA was founded back in 1993 only to serve a small but niche market of graphic processors primarily for computer games. I remember to opt for NVIDIA GeForce chip/card for my first personal computer back in 1999 to enjoy better gaming experience, it still gets the same job done nicely today.

No alt text provided for this image
H100 AI GPU with built in LLMs by NVIDIA that powers OpenAI's ChatGPT and Google's BARD!


Then what really happened here ? In last 5 years alone, the market cap jumped more than 7 times. INTEL the micro processor giant is at $140B market cap today. The anwer is simple, NVIDIA sensed a disruptive market opportunity in AI about 10 years ago and they pivoted the whole enterprise to capture the opportunity. They found out that graphics chip could as well handle deep learning (Neural Network) type of AI computing really well, thats it. Today they make 95% of global GPU chips. They are a hardward and software company to the core at the same time.?

Then came the wave of Generative AI (LLM), not 8 months ago! Back in 2017, Google brain team released the model paper on Transformer Moodel, the paper was called "Attention is all you need!". Here is the paper. This foundational paper itself mentions that Google had used NVIDIA P100 GPUs to train the world's first "Transformer" model.

Here is a quick tour of Language Model history and evolution till date for starters:

Language modeling (LM)?uses?statistical and probabilistic techniques to determine the probability of a given sequence of words occurring in a sentence. Hence, a language model is basically a probability distribution over sequences of words :

No alt text provided for this image
The LM computes the conditional probability distribution where x{(t+1)} can be any word in the vocabulary.

Language models?generate probabilities by learning from one or more text corpus. Large Language Models (LLMs)?are basically?neural language models working at a larger scale. A large language model consists of a neural network with possibly billions of parameters. Moreover, it’s typically trained on vast quantities of unlabeled text, possibly running into hundreds of billions of words.

A?foundation model?generally refers to?any model trained on broad data that can be adapted to a wide range of downstream tasks. These models are typically created using deep neural networks and trained using self-supervised learning on many unlabeled data.

The term was?coined not long back?by the Stanford Institute for Human-Centered Artificial Intelligence (HAI). However, there is no clear distinction between what we call a foundation model and what qualifies as a large language model (LLM).

Nevertheless, LLMs are typically trained on language-related data like text. But a?foundation model is usually trained on multimodal data, a mix of text, images, audio, etc. More importantly, a foundation model is intended to serve as the basis or foundation for more specific tasks:

When it started, LLMs were largely created using self-supervised learning algorithms. Self-supervised learning refers to the?processing of unlabeled data to obtain useful representations?that can help with downstream learning tasks. It used Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). They trained very slowly and were not scalable.

The?introduction of transformers by the Google Brain team in 2017?is perhaps one of the most important inflection points in the history of LLMs. A?transformer?is?a deep learning model that adopts the self-attention mechanism?and processes the entire input all at once:

This is the model that is powering OpenAI's ChatGPT that was released in Nov 2022. OpenAI used thousands of NVIDIA AI GPUs in order for them to train the Large Language Model (LLM). So NVIDIA needs to be given its due in GenAI revolution. Today Microsoft, Google, Meta run their LLM models on NVIDIA GPUs. The reason NVIDIA's stock price is unstoppable right now is there is simply very little competition, hence market cap is likely to be boundless specially by riding the Generative AI/LLM wave. Picture below is a NVIDIA H100 GPU!

As we can see OpenAI clearly won the race to bring LLM to mainstream through ChatGPT, whereas Google despite being the founder of the Transformer architecture is now playing a catch up game with BARD! Similarly, could NVIDIA's early mover advantage and free run be shaken in near future? Well may be, as the next computing disruption shapes up in the near horizon! Welcome to Edge AI & Quantized AI!

Large Language Models are resource intensive, yes that's obvious. NVIDIA's H100's brutal AI force primarily powers the data center / cloud computing, but the needs are changing fast. We are going to need more edge AI (imagine your handheld/smartphone/IOT sensors) power, where large language models could be run with less memory, less battery/power and faster/cheaper model training. What is enabling this transformative shift? Well, other than 5G and edge computing power, a mathematical optimization technique for Large Language Models called Quantization, is making this happen.?

No alt text provided for this image
A simple weight quantized Neural Network, enabling edge AI/ Large Language Models (LLMs)

Quantization is a model size reduction technique that converts model weights from high-precision floating-point representation to low-precision floating-point (FP) or integer (INT) representations, such as 16-bit or 8-bit. While LLM's accuracy may get impacted marginally, but the resource requirements may come down dramatically. A well quantized LLM may not need a GPU to train or run!

In coming months, we may see very few general foundational model owners/leaders like OpenAI, Google etc with access to serious & costly computing infrastructure (NVIDIA) and a large bunch of smaller players bringing light weight quantized LLMs to the market place fine tuned/trained to specific context or business or sector problems. These offerings will be much more interesting for companies of all sizes looking for advanced & secured point solutions for their businesses comprehensively on their own private cloud.

Are you ok to just integrate with ChatGpt with open API or allow foundational models on cloud consume your private company owned IP data or would you quantize generic LLMs to compress and train to master in-company use cases ?

Arunashish Ghosh

Technology Consulting| Hyper-automation Expert | Consulting, Chief Solution Architect & Global Program Mgmt roles | BFSI Healthcare CPG Manufacturing | 23+ Yrs Ex Citi HSBC Cognizant Capgemini Virtusa | Flexi-working

1 年

Very insightful Pallab Bhattacharya thanks for sharing this

回复
Vidhya Rohit

Vice President - Transformation and Innovation

1 年

Interesting read!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了