ChatGPT and Foundation Models for enterprises.. beyond the hype (1 of 5)

ChatGPT and Foundation Models for enterprises.. beyond the hype (1 of 5)

[Article 1 in a series of 5 - Tech, Use cases, Training, Big bets, Ethical AI]

Such an exciting time to be alive! 2022 was a pivotal year for AI, where we made giant leaps in Generative AI. We will look back with pride on what we accomplished in such a short span. I have been blown away by the progress we have made in the past year, and am both thrilled and cautious about our path forward. Giving the masses free access to powerful Large Language Models like ChatGPT has created endless posts about all the fun cool prompts to make ChatGPT do some incredible things. A good mix of jaw dropping responses, some hillarious, and some very convincing inaccurate responses. I had the pleasure of meeting Sam Altman, CEO of OpenAI at a conference early last year. He has an extremely evolved perspective and I am very optimistic that he will continue to have a meaningful impact on the future of AI and humanity. Lots of respect for an incredible guy!

Over the holidays I got a chance to take a few courses, get hands-on with code and dig a lot deeper into the underlying tech behind foundation models, LLMs etc. The deeper I go, the more I realize how transformative these AI models are. Such a step change from where we were before transformers. Have great appreciation for the collective progress our AI community has made, and yet how much more we still need to do to responsibly leverage this AI at scale.

My kids and I had a little fun asking AI to create our Christmas card this year. I got really excited about the impact AI will have on creative workflows, and enhance brainstorming, co-creation and what-ifs? My kids were very excited to see how they could visualize their ideas right away - What if we add a fireplace? oh that would need to be on the left, not right, else snowman will melt. I can see the next generation growing up with a very different expectation of the role AI plays in accelerating them.

No alt text provided for this image

My team has had the privilege of deploying Large Language Models (LLMs) for some of our clients with impressive results. Still need some more fine tuning, but significant lift over prior techniques. With the recent release of ChatGPT, interest from our client leadership has skyrocketed, and we have been having a lot of usecase brainstorming across all industries. I wanted to broaden the discussion to our larger LinkedIn community. To avoid making it overwhelming, I will split up the discussion into 5 sub topics, since each warrants a bit of detail - tech, use cases, training, big bets and ethics. As always, these are my personal thoughts, not my employer IBM's. Let's start with a little technical behind the scenes to provide context on how we got this far.

We see large scale applications of Analytical AI where we expect AI models to analyze data at scale, look for patterns, and make recommendations. Foundation models are the next evolution towards Artificial General Intelligence (AGI), where they are generating something new rather than surfacing content that already exists. Discriminative algorithms try to classify input data given some set of features and predict a label to which a certain data example belongs. Generative algorithms do the complete opposite — instead of predicting a label given to some features, they try to predict features given a certain label.

No alt text provided for this image

Foundation Models

AI is undergoing a paradigm shift with the rise of models such as LaMDA, BERT, DALL-E, GPT-3 etc. trained on massive amounts of data, using self-supervision at scale, that can be adapted to a wide range of downstream tasks. These models are called foundation models to underscore their critically central yet incomplete character. Foundation models are enabled by transfer learning (Thrun 1998) and scale. The idea of transfer learning is to take the “knowledge” learned from one task (e.g., object recognition in images) and apply it to another task (e.g., activity recognition in videos). Within deep learning, pre-training is the dominant approach to transfer learning where a model is trained on a task and then adapted to the downstream task of interest via fine-tuning. Stanford's paper on 'Opportunities and Risks of Foundation models' is an incredible paper that defined this space, and is a must-read.

Transfer learning is what makes foundation models possible, but scale is what makes them powerful. Scale requires three things and only in the past few years have we gotten to this trinity:

  1. Availability of unprecedented compute power
  2. Availability of massive amounts of training data
  3. Development of the Transformer model architecture that leverages the parallelism of the hardware to train much more expressive models than before

Transformer Models

The 2017 Google paper "Attention Is All You Need" introduced Transformers and it revolutionized how the AI community built large scale deep learning models - Mad respect for the authors! Transformer is a neural network architecture that relies on self-attention mechanisms to process input and generate output. The Transformer made it possible to train large, deep learning models on problems with long-range dependencies without the need for recurrence or convolution. It allowed for parallelization of both the training and the inference stages of these models, making them much more efficient to run on hardware. Self-attention mechanisms allow the model to weight different parts of the input according to their relevance to the task at hand eg. in use cases where the meaning of a word can depend on the context of the entire sentence. Transformers have proven to be very scalable, which means their performance and accuracy improve as they are made larger and fed more data. But more importantly, transformer models can be trained through unsupervised or self-supervised learning, meaning they require very little human-annotated data, which had been one of the main bottlenecks of existing deep learning methods. The Transformer architecture further inspired the development of other self-attention based models, such as GPT-3 (Generative Pre-trained Transformer model), LaMDA (Language Model for Dialogue Applications) which are incredible at NLP tasks. A variation of the transformer, the “vision transformer,” is also used for visual tasks such as image classification and generating images from text eg. DALL-E.?I am most impressed by the work of https://stability.ai/ in this space, having open sourced multi modal models. This Stanford lecture by Ashish Vaswani & Anna Huang is a great talk on Transformers and Self-Attention.

Generative AI

Generative AI creates new content by utilizing existing text, audio files, or images. It detects and learns the underlying pattern of the input and produces similar content. The hype around generative AI is exploding. I personally think we are still underestimating the true impact. Gartner made some interesting predictions in their Emerging Technologies and Trends Impact Radar for 2022 report.

  • By 2025, generative AI will be producing 10 percent of all data (now it’s less than 1 percent) with 20 percent of all test data for consumer-facing use cases. <I think this is understated>
  • By 2025, generative AI will be used by 50 percent of drug discovery and development initiatives. <This one may be a stretch>
  • By 2027, 30 percent of manufacturers will use generative AI to enhance their product development effectiveness. <Given the economic reality of 2023, cost cutting will slow this down>

Below is a simple Generative AI application landscape by Sequoia Capital across various modalities

No alt text provided for this image

Here are some popular techniques we use for generative AI:

  1. Transformers that we discussed above imitates cognitive attention and differentially measures the significance of different parts of the input
  2. Generative Pre-trained Transformer models (GPT) GPT-n are transformer-based models that have been pre-trained using unsupervised learning on large amounts of data. GPT-n are then fine-tuned by humans to generate human-like text, by conditioning the model with specific inputs, like tasks to generate complete stories or answer questions.
  3. Generative Adversarial Networks (GANs): GANs were introduced by Jan Goodfellow and his colleagues at the University of Montreal?in this paper in 2014. GANs consist of two neural networks: a generator and a discriminator. The generator produces new samples, while the discriminator attempts to distinguish the generated samples from real samples. The generator and discriminator are trained together, with the generator trying to produce samples that can fool the discriminator, and the discriminator trying to correctly identify whether a sample is real or generated.
  4. Variational Autoencoders (VAEs): VAEs are a type of generative model that can learn to represent high-dimensional data in a lower-dimensional space, called the latent space. They consist of two neural networks: an encoder and a decoder. The encoder maps the input data to a latent representation, and the decoder maps the latent representation back to the original data. The VAE is trained to reconstruct the input data from the latent representation, while also encouraging the latent representation to be smooth and continuous, allowing for the generation of new samples.

OpenAI's ChatGPT

Let's drill deeper into OpenAI's ChatGPT and all the buzz around it.

No alt text provided for this image
No alt text provided for this image

We will focus on OpenAI's Language Models (GPT-3 and ChatGPT) here, and will discuss other OpenAI models (DALL-E 2, Codex and Whisper) in detail in a different article.

GPT-3 is a transformer-based language model trained on a large corpus of diverse internet text, allowing it to have a wide range of knowledge and generate text that is coherent and diverse. It was introduced by OpenAI in July 2020 in the paper Language Models are Few-Shot Learners. ChatGPT is a variant of GPT-3 that is fine-tuned for conversational language generation released in Nov 2022. It is pre-trained on conversational data, allowing it to generate coherent and human-like responses in a chat or dialogue setting. GPT-3 and ChatGPT are slightly different:

  • Model size: ChatGPT is a smaller version of GPT-3 with fewer parameters, which means it requires less computational power to run and is more lightweight.
  • Training data: GPT-3 is trained on a much larger dataset than ChatGPT, which includes a diverse range of internet texts and other forms of data, resulting in a more robust model with greater generalization capabilities.
  • Fine-tuning ability: ChatGPT is generally less flexible when it comes to fine-tuning, as it is designed to perform well in specific conversational tasks. GPT-3, on the other hand, can be fine-tuned to perform well on a wide range of natural language processing tasks, such as language translation, summarization, and question answering.
  • Language generation: ChatGPT is optimized for conversational language generation, meaning it can generate more coherent, human-like responses in a chat or dialogue setting. GPT-3 can also generate coherent and human-like text, but it is also capable of generating text that is not necessarily conversational.
  • API access: Both ChatGPT and GPT-3 can be accessed through the OpenAI API, but GPT-3 also has additional access options such as the GPT-3 Sandbox, and GPT-3 Playground, which allows for more flexibility in terms of usage and experimentation.

State of AI report from 2022 does a phenomenal job of distilling down how GPT-3 has influenced the growth of LLMs and other modalities. It's an annual must-read.

No alt text provided for this image

If you want to drill even deeper to the full catalog of Large Language Models, this post does a great job cataloging each one. It's a looooong list.

No alt text provided for this image

Generally, I have had good luck playing with GPT-3 playground to figure our how I want to use the model before we start fine tuning it. By providing some sample outputs in the prompt itself, you can easily try zero-shot, one-shot and few-shot training to influence the output. Since there is a limit to how many tokens you can provide in the prompt, you really do need to spend some time creating a fine-tuned model and query that instead.

No alt text provided for this image

here are some examples of what zero/one/few-shot training looks like. The graph below shows the huge boost in accuracy we get when providing only one or two examples (shots) to GPT-3 models. Here is a good article explaining the nuances.

No alt text provided for this image
No alt text provided for this image

OpenAI ChatGPT blog does a great job explaining the basics of training methodology behind ChatGPT, but the InstructGPT paper gets into a lot more detail and training results etc. OpenAI took the GPT-3.5 model, and fine-tuned it on human-generated conversations using supervised learning and reinforcement learning.

  • Step 1: Adapting GPT 3.5 model to conversations using supervised learning
  • Step 2: Training a reward model
  • Step 3: Training the model using reinforcement learning

No alt text provided for this image

Today, OpenAI provides a good range of language models to can suit your enterprise's specific task and price point, latest here:

No alt text provided for this image

These models are a phenomenal starting point, but there is a lot of fine-tuning that's needed for enterprise usecases for specific domains. There is significant work required for ChatGPT to truly 'understand' the complexity of a domain. Enterprises need to think thru the cost of execution, skills required and how they monitor and maintain these models. I will get into more on domain specific training in Article 3. here is an easy read to train a Q&A bot on a specific domain.

A few references for further reading

I will attempt to keep these posts updated with new information as I learn more. The next article 2 in this series will focus on Real world use cases of LLMs.

Rameshwar Balanagu

Growth Focused IT Executive & Digital Transformation Leader | Chief Architect -Office of the CTO | Driving Business Growth through Innovative Tech Strategies | Forbes Technology Member | Startup & Executive Advisor

1 年

very useful and actually ton of Info.

Balasubrahmanyam Pappu

IBM Sales Leader,ASEAN,Consulting

1 年

An excellent article and a must read !!

Eric Bokelberg

HR Innovation Architect | People Analytics Evangelist | AI Enthusiast

1 年

FANTASTIC article Shobhit! I can't wait for the rest of the series!

Sarma Gadepalli

Associate Partner- AI,Analytics, and Data ,Chief Data Scientist and Gen AI Solutions ,IBMAoTMember /IBM Quantum Ambassador/IBM AWS Gen AI Leader /IBM NS Leader d/BoS BMS/BoS MSR/EX TCS/Air Veteran ( Ex Indian Air Force)

1 年

Impressive Shobhit

Len Polhemus

Partner and Global D&TT Leader for IBM iX - Customer Transformation

1 年

Fantastic read, excited for the next 4 of these

要查看或添加评论,请登录

Shobhit Varshney的更多文章

  • PepsiCo's journey towards responsibly scaling AI

    PepsiCo's journey towards responsibly scaling AI

    PepsiCo products are enjoyed by consumers more than one billion times a day in more than 200 countries globally. Pepsi…

    9 条评论
  • Elevance Health delivers exceptional experiences with AI & messaging

    Elevance Health delivers exceptional experiences with AI & messaging

    Elevance Health, formerly Anthem, Inc., has has been a disruptor in the healthcare industry - maniacally focused on…

    3 条评论
  • Lessons from Top Gun Maverick

    Lessons from Top Gun Maverick

    Growing up, I was obsessed with my personal CD of Top Gun and knew the key lines by heart. Will also confess to having…

    14 条评论
  • AI for Kindergarten kids

    AI for Kindergarten kids

    Last week I had the privilege of spending a full day with my kids (kindergarten and 2nd grade) in their Elementary…

    12 条评论
  • The What, Why & How of Conversational AI

    The What, Why & How of Conversational AI

    Last week, I had the pleasure of hosting Tim Walsh from LivePerson, one of our close conversational commerce partners…

    3 条评论
  • 20 AI Avengers delivering Content Intelligence

    20 AI Avengers delivering Content Intelligence

    At IBM Services, we have a maniacal focus on skills, talent and learning opportunities. Its particularly true about our…

    4 条评论
  • What it really takes to scale AI to deliver a digital-first experience

    What it really takes to scale AI to deliver a digital-first experience

    As the SVP & CTO at Anthem, Inc., Anil Bhatt has the responsibility of bringing together Anthem’s engagement experience…

    7 条评论
  • Leveraging AI for skills inference

    Leveraging AI for skills inference

    Employee experience is critical for continuous innovation. It enables companies to compete in industries that are…

  • Fueling CVS Health's covid response with AI

    Fueling CVS Health's covid response with AI

    It has been an absolute pleasure partnering with CVS Health on their COVID-19 response. Our team of AI Avengers took on…

    5 条评论
  • 50 AI Avengers trained to deliver exceptional Voice Virtual Agents

    50 AI Avengers trained to deliver exceptional Voice Virtual Agents

    At IBM Services, we are known for our maniacal focus on skills, talent and hands-on learning - so we can always stay a…

    10 条评论

社区洞察

其他会员也浏览了