Exploration of GPT-3, ChatGPT and the Large Language Models landscape
Before Nov 30th 2022, ‘chatbot’ used to be a bad word — not anymore! Open AI’s ChatGPT beta has been the best thing that has happened in terms of raising awareness and confidence in NLP chat solutions.
GPT-3 has been a dominant language model since 2020, but like all new technologies ChatGPT is currently going through the hype cyclee. ChatGPT is a great articulator and appears so human-like.. Here it is describing quantum computing in the form of a short poem ..
But before we get into the specifics, let’s take a step back and understand what is under the bonnet …
GPT stands for Generative Pre-trained Transformers.
GPT-3 and ChatGPT are strong?generative?language models which is to say they can generate new text from the training data. They can predict the next word with high confidence — this is the reason why they come across as so articulate and human-like in their sentence framing and responses.
‘Pre-trained’ indicates that it has been trained on a (large) corpus of data already ( and can be fine tuned further on additional). The larger and more diverse the data the better it is. ChatGPT has been trained on a smaller corpus, we understand, but GPT-3 was trained on 570GB of text data. This is the reason why we are able to have an open discussion with the models about virtually any topic under the sun.
Transformers?are based on the Encoder-decoder architecture, first introduced in?a paper?from Google DeepMind in 2017, referring to the self-attention mechanism which allows it to weigh in the importance of different parts of the input. This is the reason why the models are able to understand context well and respond back in a natural manner.
2022 — the year of Large Language Models …
While ChatGPT certainly has got the most headlines, 2022 was actually the year of Large Language Models (LLMs). The year started off with DeepMind announcing their 280Bn parameter model — Gopher. For context GPT-3 has 175Bn parameters ( btw there are way larger models as well, for instance Megatron -Turing NLG model has 530 Bn parameters). In the course of the year, we saw Blender Bot 3 from Meta ( which side steps one drawback of GPT-3, that I will mention about later). There was, of course, LamDA from Google in May, which had the popular media divided on whether we had achieved singularity. Google DeepMind in September announced?Sparrow?— which detailed the use of RLHF ( Reinforcement Learning with Human Feedback) to make the models safer and counter the accuracy issues — this interestingly is what OpenAI has also done with ChatGPT. In November, AlexaTM was released by Amazon, and Meta released Galactica for scientific knowledge and Cicero for negotiations. And then on the 30th of November OpenAI announced the ChatGPT beta — and it caused a meteor sized splash and started off the tsunami that we are still surfing. ( I am sure there must have been a few other models as well, that I have inadvertently missed).
GPT-3 — now you are impressed .. now you are not
GPT-3 has been available since mid 2020, and is the result of the progression from GPT-1 (5GB data and 117Mn parameters) and GPT-2 (40GB data and 1.5Bn parameters). GPT-3, of course, was a significant leap from the previous version, both in terms of the corpus it was trained as well as size. GPT-3 can also be fine tuned on a smaller dataset than GPT-2.
GPT-3 has achieved state-of-the-art performance on multiple NLP use cases. Here is an example where it is inferring something and solving a logical puzzle
Impressive isn’t it! One can be forgiven to think that it is smart and start relying on it for answers. That would be a mistake though!
GPT-3 ( and ChatGPT), in order to generate a response and their odd need to please the user (believe me!), have a tendency to confidently present incorrect responses.
And then at times it has a tendency to ‘hallucinate’ or make stuff up.
领英推荐
Another known limitation for both GPT-3 and ChatGPT (and for most other LLMs to be fair) is that they are only as good as the data they are trained on. And training has cut off dates, for instance GPT-3 training data was up till 2021. Same is true for ChatGPT as well. This is the reason why at times it presents an incorrect response ( or to be fair at times admits that it was not trained on the information). Here is a response from GPT-3:
ChatGPT — why is it … so good
ChatGPT, on the other hand handles it better though :
This is because ChatGPT uses the feedback from the humans and the PPO (Proximal Policy Optimization) algorithm for fine tuning. This done to increase the probability of the answers being correct, safe and appropriate based on the context. The PPO algorithm works by approximating what the optimal policy would be ( for the probability distribution over the possible next word) using a surrogate function — which it then optimizes using gradient descent.
This is why OpenAI opened up the model for beta, so that using the thumbs up & down next to the responses we can provide the feedback and reduce the model’s tendency to hallucinate or provide unsafe responses. And it gets it right most of the time.
While GPT-3 endpoints are available for programmatically using it and for fine tuning, the only non-hacky way I am aware of using ChatGPT is either on the site or through the?Azure Open AI services?( which is accepting applications as of now).
Other worthy contenders…
The future … and another poem!
But despite all the stated drawbacks, the popularity of GPT-3 and more so of ChatGPT continues to grow. It is being used to generate code, generate training data for models, write books & essays, writing user manuals and much more. There is also a popular thread of ChatGPT being the Google search killer, but that is a topic for a separate post (so much to unpack there).
We will also see prompt engineering evolve further and mature as a skill set — overlapping with the world of NFTs perhaps at some point (don’t forget to mail me my royalty cheque).
In the meantime, with murmurs about Google doing a beta release for DeepMind Sparrow soon and GPT-4 expected to be released sometime in Q2-Q3, it is safe to say that 2023 will see a similar if not an increased pace of development and interest in LLMs. So watch the space!
Hope you found the post useful. ( Just like the models) I too would benefit immensely from your feedback ?? — so it would be great to receive your comments/ counter thoughts/ suggestions.
And finally I leave you with this short poem by ChatGPT talking about the future of LLMs… ( prompt: write a 4 line poem about the future of LLMs)
The future is bright for language models,
Their power and intelligence, never dulls,
With them, we’ll solve problems, great and small,
And build a better world for one and all.
IT Architect | Software & Application Development| UI/UX/Full Stack | People Manager | Project Manager
1 年I strongly feel that there is huge scope for your idea of humanly.ai ????
Lead Internal Communications EMEA at the Stepstone Group
1 年Thanks for sharing Som! I'm looking forward to seeing the outcomes of the ChatGPT Hackathon in the next 2 weeks!!
Director of Product Management @ JLL Technologies | Driving Product Innovation
1 年This is brilliantly written. Thank you for sharing Somnath Biswas sir.