登录查看更多内容

H100 GPU to Edge GenAI : Quantized & Finetuned LLMs!

Pallab Bhattacharya

Transformation Catalyst | Digital | Enterprise Lean-Agile (SAFe) | CFA- ESG & Sustainability | Executive Director @EY | Ex. BNY Mellon, Morgan Stanley, HSBC, GE Capital (TCS), Edelweiss.

发布日期: 2023年7月23日

Which technology company entered the elite league of $1.0 T market cap companies very recently, only to be the sixth right now on the surface of this planet, though it took them exactly 30 years?

Its the largest GPU chip manufacturer today, NVIDIA! Yes the silent disruptor in accelerated computing right now, despite not being the supplier of core microprocessor chips on all the computing device around us, INTEL and AMD dominate that space today. NVIDIA was founded back in 1993 only to serve a small but niche market of graphic processors primarily for computer games. I remember to opt for NVIDIA GeForce chip/card for my first personal computer back in 1999 to enjoy better gaming experience, it still gets the same job done nicely today.

No alt text provided for this image — H100 AI GPU with built in LLMs by NVIDIA that powers OpenAI's ChatGPT and Google's BARD!

Then what really happened here ? In last 5 years alone, the market cap jumped more than 7 times. INTEL the micro processor giant is at $140B market cap today. The anwer is simple, NVIDIA sensed a disruptive market opportunity in AI about 10 years ago and they pivoted the whole enterprise to capture the opportunity. They found out that graphics chip could as well handle deep learning (Neural Network) type of AI computing really well, thats it. Today they make 95% of global GPU chips. They are a hardward and software company to the core at the same time.?

Then came the wave of Generative AI (LLM), not 8 months ago! Back in 2017, Google brain team released the model paper on Transformer Moodel, the paper was called "Attention is all you need!". Here is the paper. This foundational paper itself mentions that Google had used NVIDIA P100 GPUs to train the world's first "Transformer" model.

Here is a quick tour of Language Model history and evolution till date for starters:

Language modeling (LM)?uses?statistical and probabilistic techniques to determine the probability of a given sequence of words occurring in a sentence. Hence, a language model is basically a probability distribution over sequences of words :

Language models?generate probabilities by learning from one or more text corpus. Large Language Models (LLMs)?are basically?neural language models working at a larger scale. A large language model consists of a neural network with possibly billions of parameters. Moreover, it’s typically trained on vast quantities of unlabeled text, possibly running into hundreds of billions of words.

A?foundation model?generally refers to?any model trained on broad data that can be adapted to a wide range of downstream tasks. These models are typically created using deep neural networks and trained using self-supervised learning on many unlabeled data.

Michael Spencer 1 年前

Training LLMs – Coming to a Consumer GPU Near You!

Lightning AI 1 年前

Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1…

Together AI 1 个月前

The term was?coined not long back?by the Stanford Institute for Human-Centered Artificial Intelligence (HAI). However, there is no clear distinction between what we call a foundation model and what qualifies as a large language model (LLM).

Nevertheless, LLMs are typically trained on language-related data like text. But a?foundation model is usually trained on multimodal data, a mix of text, images, audio, etc. More importantly, a foundation model is intended to serve as the basis or foundation for more specific tasks:

When it started, LLMs were largely created using self-supervised learning algorithms. Self-supervised learning refers to the?processing of unlabeled data to obtain useful representations?that can help with downstream learning tasks. It used Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN). They trained very slowly and were not scalable.

The?introduction of transformers by the Google Brain team in 2017?is perhaps one of the most important inflection points in the history of LLMs. A?transformer?is?a deep learning model that adopts the self-attention mechanism?and processes the entire input all at once:

This is the model that is powering OpenAI's ChatGPT that was released in Nov 2022. OpenAI used thousands of NVIDIA AI GPUs in order for them to train the Large Language Model (LLM). So NVIDIA needs to be given its due in GenAI revolution. Today Microsoft, Google, Meta run their LLM models on NVIDIA GPUs. The reason NVIDIA's stock price is unstoppable right now is there is simply very little competition, hence market cap is likely to be boundless specially by riding the Generative AI/LLM wave. Picture below is a NVIDIA H100 GPU!

As we can see OpenAI clearly won the race to bring LLM to mainstream through ChatGPT, whereas Google despite being the founder of the Transformer architecture is now playing a catch up game with BARD! Similarly, could NVIDIA's early mover advantage and free run be shaken in near future? Well may be, as the next computing disruption shapes up in the near horizon! Welcome to Edge AI & Quantized AI!

Large Language Models are resource intensive, yes that's obvious. NVIDIA's H100's brutal AI force primarily powers the data center / cloud computing, but the needs are changing fast. We are going to need more edge AI (imagine your handheld/smartphone/IOT sensors) power, where large language models could be run with less memory, less battery/power and faster/cheaper model training. What is enabling this transformative shift? Well, other than 5G and edge computing power, a mathematical optimization technique for Large Language Models called Quantization, is making this happen.?

Quantization is a model size reduction technique that converts model weights from high-precision floating-point representation to low-precision floating-point (FP) or integer (INT) representations, such as 16-bit or 8-bit. While LLM's accuracy may get impacted marginally, but the resource requirements may come down dramatically. A well quantized LLM may not need a GPU to train or run!

In coming months, we may see very few general foundational model owners/leaders like OpenAI, Google etc with access to serious & costly computing infrastructure (NVIDIA) and a large bunch of smaller players bringing light weight quantized LLMs to the market place fine tuned/trained to specific context or business or sector problems. These offerings will be much more interesting for companies of all sizes looking for advanced & secured point solutions for their businesses comprehensively on their own private cloud.

Are you ok to just integrate with ChatGpt with open API or allow foundational models on cloud consume your private company owned IP data or would you quantize generic LLMs to compress and train to master in-company use cases ?

Arunashish Ghosh

1 年

Very insightful Pallab Bhattacharya thanks for sharing this

Vidhya Rohit

Vice President - Transformation and Innovation

1 年

Interesting read!

查看更多评论

要查看或添加评论，请登录

查看全部

H100 GPU to Edge GenAI : Quantized & Finetuned LLMs!

Pallab Bhattacharya

Transformation Catalyst | Digital | Enterprise Lean-Agile (SAFe) | CFA- ESG & Sustainability | Executive Director @EY | Ex. BNY Mellon, Morgan Stanley, HSBC, GE Capital (TCS), Edelweiss.

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

Insider’s Edit: Nvidia’s Future Defining Hardware, Google’s AI Search Edits, Microsoft’s $18.9 Billion Partnership

Leading in AI: A holistic approach that is uniquely Intel

We Finally Found NeMo! (no, not the clownfish)

Vision processing with NVIDIA and Jetson at the edge

The Technical Risks to NVIDIA's MarCap are Fundamental

Notable and Interesting Recent AI News, Articles, and Papers for Thursday, August 01, 2024

Deep Learning - year 16 quarter 1

What is driving NVIDIA’s recent surge in valuation?

New chip architectures for today’s AI

Exploring NVIDIA's AI and Machine Learning Frameworks: A Guide to Accelerated Innovation

领英推荐

Lean Agile Leadership as SAFe SPC!

2023年4月5日

CO2 Graveyard! - A disruptive innovation in 'GHG Circularity' to tackle Climate Change!

2023年3月9日

Get your analytics division on steroid : Financial Services Data Science Framework & Use cases!

2018年12月8日

KYC on Block-Chain : Resolve distrust through encrypted & distributed customer identities!

2017年9月17日

The glorious mission of cognitive computing : Just got one step closer!

2016年8月8日

IBM BlueMix - An Innovation Thunderball

2016年5月1日

Making sense of Service Quality (BPO ?Industry case study)

2015年9月5日