ChatGPT internals, and its implications for Enterprise AI
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1. Background
ChatGPT has taken over the internet the last few days. Most people have been amazed by its responses. If you still haven’t tried ChatGPT, it is available with a free registration on OpenAI’s website: https://chat.openai.com/
If you don’t come from a Natural Language Processing (NLP) background, ChatGPT’s responses may seem amazing. So let’s delve a bit deeper into the technical details.
Transformer architectures
NLP has been disrupted in the last few years by two massive Deep Learning models based on the transformer architecture: Google’s BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-Training) by Open AI.?
The major advantage of GPT models is the sheer volume of data they are trained on. For example, GPT-3: the third-generation GPT model, was trained on 175 billion parameters. GPT-3 thus acts a pre-trained Large Language Model (LLM) that can be fine-tuned with very little data to accomplish novel NLP tasks, such as,
ChatGPT is thus the Chatbot [1] application of GPT-3 LLM. It is based on the InstructGPT released by OpenAI in January.?
The main improvement is that ChatGPT is able to carry out continuous conversation, that is, it remembers previous conversations and can replay them to the user. This is still very limited as compared to related research where historical conversation is taken into account by a Chatbot to drive the conversation towards a goal, e.g., making a sale [2, 3].
So ChatGPT follows a rich history of NLP Research, and is not a revolutionary new technique that suddenly became known one fine day.
Capabilities and Limitations
With this technical background, let us try to analyze ChatGPT’s capabilities and limitations.
Generative AI
Any Chatbot [1], at a very high level, consists of the following steps (highlighted in the article profile image):
The reason ChatGPT went viral is because of it generative capabilities.?
Similar to the hype of Stable Diffusion’s text-to-image deep learning model, which people were (or are still) using to generate new art ‘influenced’ by styles of famous artists; most people are still using ChatGPT to generate responses in the style of their favourite theatrical characters.
The generative aspect, be it for text, images, or videos; is based on Generative Adversarial Networks (GANs) [4]. Intuitively, a GAN can be considered as a game between two networks: A Generator network and a second Classifier network. A Classifier can, e.g., be a Convolutional Neural Network (CNN) based image classification network; distinguishing samples as either coming from the actual distribution or from the Generator. Every time the Classifier is able to tell a fake image, i.e. it notices a difference between the two distributions; the Generator adjusts its parameters accordingly. At the end (in theory), the Classifier will be unable to distinguish, implying the Generator is then able to reproduce the original data set.
Enterprise use-cases of NLG:
As previously mentioned, ChatGPT applications as a pure play content generation tool for Marketing, in general any internal or external Org Communication, is clear and disruptive. For example, it can come up with a proposed list of planning activities, that is most likely better than what 80% of human market players could generate.?
However, other than that, the only place where we have seen some application of NLG in the enterprise is to generate text summaries of figures and reports.
ChatGPT, in the future, could interpret company data (e.g., sales figures, marketing responses) and generate short narrative text for target audiences — enabling quick and easy to read?reports.
The main challenge of Image-to-Text models today is not the capability to generate textual descriptions of figures; but that they need to be very personalized to highlight the specific insights that a business user is looking for, taking into context the org / business unit domain.
A related use-case here is to query databases in natural language.?
With text-to-SQL, the main goal has been to recreate the SQL querying to databases paradigm, to one using Natural Language Queries (NLQs). The field is called Natural Language Interface to Databases (NLIDB), and I have already covered it in detail in a previous article [5], including the impact of LLMs.
User understanding & friendliness
The user friendliness of ChatGPT is another important reason underlying its popularity.
Enterprise Chatbots today are very task / domain focused. Experienced Chatbot creators will know how we are always afraid of the ‘unknown’ — the generic / unexpected questions that the user might end up asking. So we spend significant effort in designing the flow / guiding the user to ask relevant questions.
To the average user, this works as a deterrent with respect to chatbot adoption, where they have to think twice (or many times) to ensure that they are asking the right questions. Basically, spend time and effort in framing the questions properly.
ChatGPT upended the game here by allowing the user to ask anything - significantly lowering the barrier to entry. This was complemented by ChatGPT responding with appropriate safety guards in place.
OpenAI describes this as ChatGPT can “answer follow-up questions, admit its mistakes, challenge incorrect premises and reject inappropriate requests.” As difficult as it is to understand the user intent, it is equally challenging to differentiate when the user is trying to manipulate / attack the chatbot — esp., for external (consumer facing) bots. And, ChatGPT performs this ‘classification task’ very well.
Let me also take this opportunity to say that ChatGPT is not going to make the enterprise chatbots that we have built on AWS Lex, Microsoft LUIS, Google Dialogflow redundant anytime soon. They have been painstakingly designed (manually curated) for narrow / domain focused use-cases and it will be difficult to fine-tune a generic LLM to achieve that level of accuracy for those specific set of questions — without manual intervention.
Enterprise Search
Focusing on the information extraction aspect, a lot of people have been using it to search for information. Here also, ChatGPT excels where it is able to retrieve not only textual responses from documents / web pages, but also answer complex queries related to programming, maths equations, etc.
This has led to a lot of discussion on whether ChatGPT will disrupt Search / replace Google? Will Bing Search finally overtake Google Search, given that OpenAI is Microsoft owned.
The answer is: No!?, and this is primarily because it is not an apples to apples (or even, oranges to oranges) comparison.?
Search is about retrieving existing information, with links to the source. So the most amazing part of ChatGPT, i. e., the generative part is not relevant here. And, when it comes to the retrieval part, most search engines, e. g., Google, already leverage BERT [6] or similar Transformer based Architectures.?
I guess what frustrates people most is the ‘advertisement’ results that Google pushes at the top, rather than the most relevant results. In short, nothing much will change dramatically, other than the size of the underlying LLMs.
The generative part (even if somewhat unpredictable) can still be relevant from a search perspective, as long as ChatGPT can link its responses to the source articles. Unfortunately, this is easier said than done, and is part of a larger problem of Ethical AI / Explainable AI [7] (referred to as 'textual entailment' in NLP context), where it is unclear which part/item of the underlying training dataset contributed to a response.
Finally, the ‘freshness’ aspect of Search. A large portion of our search is related to new/recent events and their corresponding articles. This is applicable to both generic search engines like Google, as well as Enterprise Search [8] scenarios. So the underlying LLMs will need to be continuously re-trained with new information to return up-to-date responses. Here, it is important to understand that while LLMs are often thought of Self-Learning Models,
a significant amount of manual effort has been incorporated in the form of user feedback, to improve the accuracy of ChatGPT leveraging Reinforcement Learning [9].
Continuous (but Manual) Improvement?
Reinforcement Learning (RL) is able to achieve complex goals by maximizing a reward function in real-time. The reward function works similar to incentivizing a child with candy and spankings, such that the algorithm is penalized when it takes a wrong decision and rewarded when it takes a right one — this is reinforcement. The reinforcement aspect also allows it to adapt faster to real-time changes in the user sentiment.
At the core of this approach [10] is a score model, which is trained to score chatbot query-response tuples based on (manual) user feedback. The scores predicted by this model are used as rewards for the RL agent. Proximal Policy Optimization is then used as a final step to further tune ChatGPT.
In short, re-training or adding new information to LLMs is not fully automated. RL helps in performing this in a targeted fashion; however manual intervention is still needed to achieve this level of accuracy and protect them against bias / manipulation.?
Conclusion
ChatGPT is an important milestone in the AI evolution journey. Its generative capabilities are amazing, where it is able to generate ‘human like’ text in a very convincing fashion. So its capabilities to generate content for Marketing and Communication scenarios is disruptive. However, this is also the most scary part and an absolute nightmare from an Intellectual Property (IP) point-of-view — for artists and content creators alike. So we can only hope that regulators will act promptly and tools / regulations will follow at the same speed as the core technology to limit its abuse.?
It is also important to understand that ChatGPT is an incremental step in NLP Research. Its power comes from combining a number of fundamental techniques, e.g., Transformers, GANs, Reinforcement Learning. So it will not lead to any sudden NLP disruption in the enterprise. Benefits will follow in terms of improvements to fundamental NLP tasks, e.g., Q&A, translation, summarization; that will slowly propagate to enterprise use-cases.?
Most enterprises today do not have the skills and infrastructure capabilities to leverage the core technology blocks of ChatGPT themselves. So further work will be needed in the form of mature tools, APIs, etc., that will ease the process of fine-tuning pre-trained LLMs to enterprise specific data and domains.
References
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1 年Thanks for referencing the article Walter Lee Indeed Gen AI/LLMs have changed the Chatbot landscape #Chatbots?made?#easy?with?#GenAI?because no more MANY INTENTS ! https://www.dhirubhai.net/posts/walterwlee_chatbots-easy-genai-activity-7078805658992574464-az9R?utm_source=share&utm_medium=member_desktop
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1 年Thanks for referencing the article ???? Vadym Kazulkin ???? Great presentation: Github Copilot vs Amazon CodeWhisperer for Java developers at JCON 2023 https://www.slideshare.net/VadymKazulkin/github-copilot-vs-amazon-codewhisperer-for-java-developers-at-jcon-2023
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
1 年Happy to see the article quoted by the New Indian Express https://www.newindianexpress.com/opinions/2022/dec/24/smart-bot-conversationalist-2531150.html
Building AI Products with AI Agents and Agentic Workflows! Let's do Applied AI!
1 年Some of the enterprise applications of ChatGPT feel so close and yet so far. Two examples are text to SQL and generating insights from reports or images. The positive results on trivial examples will lead more than one enterprise to run all sorts of POCs with ChatGPT. It feels very reminiscent of the previous waves of chatbots, reinforcement learning or deep learning, where most enterprises realised the use cases were not there yet. Already GPT-3 can generate SQL or Python, but this is far from usable and requires an expert heavily familiar with the programming language. For the generation of insights based on charts, it is possible to generate text around spatial data such as coordinates. On the other hand, describing that this month's sales are lower than last month's and this has been an ongoing trend for three quarters by looking at a plot is still something Large Language Models are not yet able to do.