ChatGPT and Foundation Models for enterprises.. beyond the hype (1 of 5)
Shobhit Varshney
VP & Sr. Partner, Americas AI Leader at IBM [shobhitvarshney.com] 2023 Tech Leader of the Year
[Article 1 in a series of 5 - Tech, Use cases, Training, Big bets, Ethical AI]
Such an exciting time to be alive! 2022 was a pivotal year for AI, where we made giant leaps in Generative AI. We will look back with pride on what we accomplished in such a short span. I have been blown away by the progress we have made in the past year, and am both thrilled and cautious about our path forward. Giving the masses free access to powerful Large Language Models like ChatGPT has created endless posts about all the fun cool prompts to make ChatGPT do some incredible things. A good mix of jaw dropping responses, some hillarious, and some very convincing inaccurate responses. I had the pleasure of meeting Sam Altman, CEO of OpenAI at a conference early last year. He has an extremely evolved perspective and I am very optimistic that he will continue to have a meaningful impact on the future of AI and humanity. Lots of respect for an incredible guy!
Over the holidays I got a chance to take a few courses, get hands-on with code and dig a lot deeper into the underlying tech behind foundation models, LLMs etc. The deeper I go, the more I realize how transformative these AI models are. Such a step change from where we were before transformers. Have great appreciation for the collective progress our AI community has made, and yet how much more we still need to do to responsibly leverage this AI at scale.
My kids and I had a little fun asking AI to create our Christmas card this year. I got really excited about the impact AI will have on creative workflows, and enhance brainstorming, co-creation and what-ifs? My kids were very excited to see how they could visualize their ideas right away - What if we add a fireplace? oh that would need to be on the left, not right, else snowman will melt. I can see the next generation growing up with a very different expectation of the role AI plays in accelerating them.
My team has had the privilege of deploying Large Language Models (LLMs) for some of our clients with impressive results. Still need some more fine tuning, but significant lift over prior techniques. With the recent release of ChatGPT, interest from our client leadership has skyrocketed, and we have been having a lot of usecase brainstorming across all industries. I wanted to broaden the discussion to our larger LinkedIn community. To avoid making it overwhelming, I will split up the discussion into 5 sub topics, since each warrants a bit of detail - tech, use cases, training, big bets and ethics. As always, these are my personal thoughts, not my employer IBM's. Let's start with a little technical behind the scenes to provide context on how we got this far.
We see large scale applications of Analytical AI where we expect AI models to analyze data at scale, look for patterns, and make recommendations. Foundation models are the next evolution towards Artificial General Intelligence (AGI), where they are generating something new rather than surfacing content that already exists. Discriminative algorithms try to classify input data given some set of features and predict a label to which a certain data example belongs. Generative algorithms do the complete opposite — instead of predicting a label given to some features, they try to predict features given a certain label.
Foundation Models
AI is undergoing a paradigm shift with the rise of models such as LaMDA, BERT, DALL-E, GPT-3 etc. trained on massive amounts of data, using self-supervision at scale, that can be adapted to a wide range of downstream tasks. These models are called foundation models to underscore their critically central yet incomplete character. Foundation models are enabled by transfer learning (Thrun 1998) and scale. The idea of transfer learning is to take the “knowledge” learned from one task (e.g., object recognition in images) and apply it to another task (e.g., activity recognition in videos). Within deep learning, pre-training is the dominant approach to transfer learning where a model is trained on a task and then adapted to the downstream task of interest via fine-tuning. Stanford's paper on 'Opportunities and Risks of Foundation models' is an incredible paper that defined this space, and is a must-read.
Transfer learning is what makes foundation models possible, but scale is what makes them powerful. Scale requires three things and only in the past few years have we gotten to this trinity:
Transformer Models
The 2017 Google paper "Attention Is All You Need" introduced Transformers and it revolutionized how the AI community built large scale deep learning models - Mad respect for the authors! Transformer is a neural network architecture that relies on self-attention mechanisms to process input and generate output. The Transformer made it possible to train large, deep learning models on problems with long-range dependencies without the need for recurrence or convolution. It allowed for parallelization of both the training and the inference stages of these models, making them much more efficient to run on hardware. Self-attention mechanisms allow the model to weight different parts of the input according to their relevance to the task at hand eg. in use cases where the meaning of a word can depend on the context of the entire sentence. Transformers have proven to be very scalable, which means their performance and accuracy improve as they are made larger and fed more data. But more importantly, transformer models can be trained through unsupervised or self-supervised learning, meaning they require very little human-annotated data, which had been one of the main bottlenecks of existing deep learning methods. The Transformer architecture further inspired the development of other self-attention based models, such as GPT-3 (Generative Pre-trained Transformer model), LaMDA (Language Model for Dialogue Applications) which are incredible at NLP tasks. A variation of the transformer, the “vision transformer,” is also used for visual tasks such as image classification and generating images from text eg. DALL-E.?I am most impressed by the work of https://stability.ai/ in this space, having open sourced multi modal models. This Stanford lecture by Ashish Vaswani & Anna Huang is a great talk on Transformers and Self-Attention.
Generative AI
Generative AI creates new content by utilizing existing text, audio files, or images. It detects and learns the underlying pattern of the input and produces similar content. The hype around generative AI is exploding. I personally think we are still underestimating the true impact. Gartner made some interesting predictions in their Emerging Technologies and Trends Impact Radar for 2022 report.
Below is a simple Generative AI application landscape by Sequoia Capital across various modalities
Here are some popular techniques we use for generative AI:
OpenAI's ChatGPT
Let's drill deeper into OpenAI's ChatGPT and all the buzz around it.
领英推荐
We will focus on OpenAI's Language Models (GPT-3 and ChatGPT) here, and will discuss other OpenAI models (DALL-E 2, Codex and Whisper) in detail in a different article.
GPT-3 is a transformer-based language model trained on a large corpus of diverse internet text, allowing it to have a wide range of knowledge and generate text that is coherent and diverse. It was introduced by OpenAI in July 2020 in the paper Language Models are Few-Shot Learners. ChatGPT is a variant of GPT-3 that is fine-tuned for conversational language generation released in Nov 2022. It is pre-trained on conversational data, allowing it to generate coherent and human-like responses in a chat or dialogue setting. GPT-3 and ChatGPT are slightly different:
State of AI report from 2022 does a phenomenal job of distilling down how GPT-3 has influenced the growth of LLMs and other modalities. It's an annual must-read.
If you want to drill even deeper to the full catalog of Large Language Models, this post does a great job cataloging each one. It's a looooong list.
Generally, I have had good luck playing with GPT-3 playground to figure our how I want to use the model before we start fine tuning it. By providing some sample outputs in the prompt itself, you can easily try zero-shot, one-shot and few-shot training to influence the output. Since there is a limit to how many tokens you can provide in the prompt, you really do need to spend some time creating a fine-tuned model and query that instead.
here are some examples of what zero/one/few-shot training looks like. The graph below shows the huge boost in accuracy we get when providing only one or two examples (shots) to GPT-3 models. Here is a good article explaining the nuances.
OpenAI ChatGPT blog does a great job explaining the basics of training methodology behind ChatGPT, but the InstructGPT paper gets into a lot more detail and training results etc. OpenAI took the GPT-3.5 model, and fine-tuned it on human-generated conversations using supervised learning and reinforcement learning.
Today, OpenAI provides a good range of language models to can suit your enterprise's specific task and price point, latest here:
These models are a phenomenal starting point, but there is a lot of fine-tuning that's needed for enterprise usecases for specific domains. There is significant work required for ChatGPT to truly 'understand' the complexity of a domain. Enterprises need to think thru the cost of execution, skills required and how they monitor and maintain these models. I will get into more on domain specific training in Article 3. here is an easy read to train a Q&A bot on a specific domain.
A few references for further reading
I will attempt to keep these posts updated with new information as I learn more. The next article 2 in this series will focus on Real world use cases of LLMs.
Growth Focused IT Executive & Digital Transformation Leader | Chief Architect -Office of the CTO | Driving Business Growth through Innovative Tech Strategies | Forbes Technology Member | Startup & Executive Advisor
1 年very useful and actually ton of Info.
IBM Sales Leader,ASEAN,Consulting
1 年An excellent article and a must read !!
HR Innovation Architect | People Analytics Evangelist | AI Enthusiast
1 年FANTASTIC article Shobhit! I can't wait for the rest of the series!
Associate Partner- AI,Analytics, and Data ,Chief Data Scientist and Gen AI Solutions ,IBMAoTMember /IBM Quantum Ambassador/IBM AWS Gen AI Leader /IBM NS Leader d/BoS BMS/BoS MSR/EX TCS/Air Veteran ( Ex Indian Air Force)
1 年Impressive Shobhit
Partner and Global D&TT Leader for IBM iX - Customer Transformation
1 年Fantastic read, excited for the next 4 of these