登录查看更多内容

Mastering ChatGPT & Large Language Models: Tips and Tricks for Using, Understanding and Engineering Your preferred conversational agent

Sébastien Brasseur

Distinguished Engineer - Data

发布日期: 2023年3月7日

Hi folks! I guess you’ve all heard about ChatGPT and generative models in the past few weeks.

But first, let me reassure you that this post is not about doing hyperbolic predictions or statements about Generative AI, and this article is not written with the help of ChatGPT to showcase its capabilities.???

Rather, it is a quite pragmatic one sharing some insights and resources for how to best use, understand and eventually customize the so-called large language models (LLMs) and next generation chatbots.

Indeed, I believe this technology will be progressively diffused in our digital landscape and it will be an important one to master, just like spreadsheets, slide decks, social networks, etc. In this context, it’s crucial to make sure that everybody gets basic training to use it correctly. Then, we also need to make sure that in every organization there is enough practitioners that understand the theory behind it. And finally, we’ll also need at least a small number of LLM experts that know how to customize/fine-tune LLMs for specific use cases.

While the best way to begin to learn using ChatGPT is simply… using it, I also believe that to get better at it, you have to understand a bit more deeply some basic concepts and to read a few resources about them: prompting, zero-shot learning, one-shot learning, few-shot learning and chain-of-thought:

Prompting?is simply the process of providing an input or a phrase to the model, which is then used to generate a response or a continuation of the input. In other words, it’s providing the model with the relevant context or direction, so that it can generate a consistent, coherent, and relevant output. It can be a few words or a longer sentence. While the maximum prompt size for ChatGPT is not publicly documented, OpenAI documents that?GPT-3.5?(on which ChatGPT is based) has a maximum request size of 4000 tokens (with a token roughly equivalent to 4 chars in English). I definitely recommend reading the?OpenAI documentation?for designing efficient prompts, the?Awesome ChatGPT Prompts?GitHub repository that provides very cool and creative prompt examples, and the accompanying and free e-book?The Art of ChatGPT Prompting: A Guide to Crafting Clear and Effective Prompts.
Zero-shot/one-shot/few-shot learning:?sometimes, like with a human, giving a task to ChatGPT (or any LLM) without instructions or examples confuses it and does not provide relevant answers or outcomes. Zero-shot is when the model predicts the answer without any instruction. For example, with the following prompt: “Translate English to French: cheese =>”. One-shot is when in addition to the task description, the model sees a single example of the task: “Translate English to French: see otter => loutre de mer, cheese =>”. And few-shot is when multiple examples are provided. Of course, LLMs tend to provide better results when you give them some examples as input. It is important to note, that contrary to fine-tuning that will be discussed later, when you do one-shot or few-shot learning, there is no model parameter update. You might not realize it, but in the history of AI it is a fantastic breakthrough to have models able to perform multiple tasks without being fine-tuned!
Chain-of-Thought prompting (CoT):?LLMs and more specifically ChatGPT tend to hallucinate sometimes. That means, they simply invent facts or provide wrong answers. Interestingly enough, it has been demonstrated that if you craft your prompt in a way that explicitly asks the model to explain its “reasoning” step-by-step instead of providing a plain answer, the result will be more accurate. So, instead of simply asking a question, you can put “Let’s think step by step.” as a prefix. For example, asking ChatGPT: “Let's think step by step. Is it possible to have two consecutive prime numbers beyond 2 and 3?.” gives a much more convincing demonstration than if you simply ask “Is it possible to have two consecutive prime numbers beyond 2 and 3?”.

By leveraging best practices in prompting, one-shot or few-shot learning when it makes sense and CoT prompting, you should become much more efficient at using ChatGPT and other LLMs to produce relevant outcomes.

From Using to Understanding large language models (LLMs)

Now, if you are an engineer and want to understand a bit more the theory and how it works under the covers, you might be interested in the following resources:

First of all, if you have basic machine learning skills but didn’t catch-up on deep learning, you should probably start with this Stanford curriculum notes which are quite synthetic and easy to grasp:?Teaching - CS 230 (stanford.edu). I especially recommend the Recurrent Neural Networks cheat sheet that will provide you with some foundational knowledge necessary for language models.
Then, I recommend the foundational paper?Attention Is All You Need?that introduced the Transformer architecture, that is still the core building block of large language models today.
Language Models are Few-Shot Learners, introducing GPT-3 is also an important one to read, especially to understand the implications of large language models’ ability to generalize well thanks to X-shot learning. On the same topic, I also recommend?Is ChatGPT a General-Purpose Natural Language Processing Task Solver??to also understand the limits of LLMs compared to fine-tuned language models and?Toolformer: Language Models Can Teach Themselves to Use Tools?to introduce you to a plausible evolution of LLMs into hybrid models that are able to integrate calls to external sources of knowledge and external APIs to accomplish specific tasks and provide accurate and timely answers.
On the same topic, two very interesting research papers have been published by Google:?Large Language Models with Controllable Working Memory?and?Decoupled Context Processing for Context Augmented Language Modeling. These papers probably provide a hint on the evolution of search engines, as hybrid systems consisting of the “traditional” Search index & ranks, and LLMs. It seems to be the direction also taken by Microsoft with Prometheus:?Building the New Bing | Search Quality Insights.
Finally, I also recommend?Training language models to follow instructions with human feedback?as it explains the process OpenAI used to fine-tune ChatGPT on top of GPT-3.5, with a process called Reinforcement Learning with Human Feedback (RLHF). This process is quite interesting because it directly integrates human feedback in order to steer the development of conversational agents like ChatGPT with impressive results and a relatively modest economic impact.

领英推荐

AI Showdown: ChatGPT vs. Google's Gemini – Which…

Bernard Marr 1 年前

ChatGPT vs Bard: Who?Wins? (Surprising Results)

Nicky Verd 1 年前

ChatGPT vs Grok: Unraveling the Threads of…

Data Science AI Learner Community 1 年前

Engineering your own LLM?

So far, you should have enough resources to efficiently use language models and understand the theory behind it. Now, you might be interested in deploying or fine-tuning these models for your specific use cases (for example to incorporate into a model some knowledge bases or corpuses of documents specific to your organization).

While (as-of-today) OpenAI does not provide a direct access to their models (you can only consume them through APIs and are limited to the interactions mentioned in the first section of this article),?EleutherAI, a non-profit AI research lab, released a few open-source models trained on the GPT architecture like GPT-J, GPT-Neo or GPT-NeoX. These models are available on?HuggingFace?and the following GitHub repository explains how GPT-NeoX can be trained and fine-tuned with your specific datasets:?EleutherAI/gpt-neox: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library. (github.com). While GPT-NeoX-20B was developed primarily for research purposes, you are allowed to further fine-tune and adapt GPT-NeoX-20B for deployment, as long as your use is in accordance with the Apache 2.0 license.

The BigScience initiative also introduced?BLOOM, a multi-lingual LLM meant to be completely open source and customizable and published under a specific?Responsible AI License?(RAIL) that also limits the inappropriate and restrictive use cases. You should probably check if your use cases fall into these categories before using BLOOM. For example, biomedical, political, legal, and finance domains are considered as out-of-scope. Like for EleutherAI, the BLOOM model and its variants are available on HuggingFace:??bigscience/bloom · Hugging Face. But, taking into account the size of the model (176 billion parameters), maybe you don’t have the hardware nor the budget necessary to fine-tune it.

Conclusion

ChatGPT and the overall “LLM family” technologies will be increasingly used and deployed in the coming years. In this context it is of paramount importance to:

Make sure users know how to best prompt and use language models.
Make sure enough data science professionals know about the theory and the internal workings of these language models.
Make sure organizations know how they can deploy and fine-tune language models for their specific needs if necessary.

While this article only scratched the surface of these areas, I hope you found it useful, and it will help you better use/understand/customize this exciting piece of technology!

Tech Strat, Data & AI

901 位关注者

Sonia BOUDEN

Always ready to answer the call of Challenge and meet new people !

1 年

Assez sympa ton article Sébastien ! Merci Laurent pour ton partage ! Je te recommande de jeter un coup d'oeil sur ce git : https://github.com/nomic-ai/gpt4all si tu veux t'amuser un petit peu et simplement en local, et si tu veux aller plus loin, tu peux mm aller taper sur des modèles de HuggingFace ;)

1 次回应

Dominique Larue

Cloud CoE Lead for South Central Europe Capgemini

2 年

Delphine Le Garles

1 次回应

Alejandro Torres Pérez

2 年

Very useful article Sébastien! Thank you ??

1 次回应

Nysrine EL BAKKOURI

Regional Azure Strategy and Innovation Go To Market Lead for Microsoft CEMA (Central Europe Southeast Europe Middle East & Africa)

2 年

Enjoyed reading the article. Very insightful as usual :)

1 次回应

查看更多评论

要查看或添加评论，请登录

Sébastien Brasseur的更多文章

LLMs in Action: A Practical Guide for Software Architects and Developers

2023年9月7日

LLMs in Action: A Practical Guide for Software Architects and Developers

Generative AI, particularly Large Language Models (LLM), has gained immense popularity among various groups of people…

9 条评论
De Dali à DALL·E 2: simple gadget ou bien bo?te de Pandore ?

2022年11月16日

De Dali à DALL·E 2: simple gadget ou bien bo?te de Pandore ?

Bonjour à tous ! ??Une fois n’est pas coutume, je vous propose un article aux frontières de ma vie professionnelle et…

11 条评论
Quelques réflexions à la suite du salon AI for Finance 2022

2022年9月27日

Quelques réflexions à la suite du salon AI for Finance 2022

La semaine dernière, s’est déroulé le salon AI for Finance organisé par Startup Inside, réunissant l’écosystème de la…
Stratégie data en expansion & frugalité énergétique : peut-on concilier les deux aspects ? Retour sur une première expérimentation avec Microsoft

2022年6月29日

Stratégie data en expansion & frugalité énergétique : peut-on concilier les deux aspects ? Retour sur une première expérimentation avec Microsoft

Cela fait bien des années que de nombreux secteurs de l’économie et des services publics mettent en ?uvre des…

5 条评论
IA et modèles linguistiques : vers une nouvelle hégémonie ?

2022年3月15日

IA et modèles linguistiques : vers une nouvelle hégémonie ?

Introduction Depuis plusieurs années, nous sommes dans une phase de progression rapide des capacités de l’intelligence…

3 条评论
The Exponential Age

2022年1月13日

The Exponential Age

Once again, I am sharing with you a book that I read recently and found very interesting, because it explains in a very…

2 条评论
Comment permettre à chaque Data Scientist d’accomplir son plein potentiel ? ??

2021年4月14日

Comment permettre à chaque Data Scientist d’accomplir son plein potentiel ? ??

Comme Satya Nadella, PDG de Microsoft l’a déclaré aux investisseurs en Octobre dernier : ? Aujourd’hui, le secteur de…

4 条评论
France is AI - Masterclass introduction à l'IA

2019年11月6日

France is AI - Masterclass introduction à l'IA

Il y a deux semaines, dans le cadre de la conférence annuelle France is AI j’ai pu animer une Masterclass…

9 条评论
Microsoft Founder Award 2019

2019年7月24日

Microsoft Founder Award 2019

I am extremely happy and proud to humbly receive the Microsoft Founder Award for France for fiscal year 2019. I take it…

54 条评论
Architectures de références & Architectures concrètes

2019年7月10日

Architectures de références & Architectures concrètes

De nombreux architectes d’entreprises expérimentés et managers IT s’appuient sur les ? architectures de référence ?…

7 条评论

See all articles

Mastering ChatGPT & Large Language Models: Tips and Tricks for Using, Understanding and Engineering Your preferred conversational agent

Sébastien Brasseur

Distinguished Engineer - Data

From Using to Understanding large language models (LLMs)

领英推荐

Engineering your own LLM?

Conclusion

Tech Strat, Data & AI

901 位关注者

Sébastien Brasseur的更多文章

社区洞察

其他会员也浏览了

ChatGPT Vs Google: The Ultimate Comparison Of 2023

Exploring The Power of ChatGPT: A Guide to Conversational AI

ChatGPT: How Much Does It Cost to Build a Chatbot Like Chat GPT?

Introducing ChatGPT 5.0: The Pinnacle of AI-Driven Communication

ChatGPT Vs Google: The Ultimate Comparison Of 2023

DeepSeek Vs ChatGPT: Who Will Lead the Future?

What is ChatGPT? Technology behind ChatGPT

Stop getting ChatGPT and GPT3 confused!

14 essentials and tricks working with text generation AI in ChatGPT

Fine-tuning LLMs for Enterprise

From Using to Understanding large language models (LLMs)

领英推荐

Engineering your own LLM?

Conclusion

Tech Strat, Data & AI

901 位关注者

Sébastien Brasseur的更多文章

LLMs in Action: A Practical Guide for Software Architects and Developers

De Dali à DALL·E 2: simple gadget ou bien bo?te de Pandore ?

Quelques réflexions à la suite du salon AI for Finance 2022

Stratégie data en expansion & frugalité énergétique : peut-on concilier les deux aspects ? Retour sur une première expérimentation avec Microsoft

IA et modèles linguistiques : vers une nouvelle hégémonie ?

The Exponential Age

Comment permettre à chaque Data Scientist d’accomplir son plein potentiel ? ??

France is AI - Masterclass introduction à l'IA

Microsoft Founder Award 2019

Architectures de références & Architectures concrètes

社区洞察

其他会员也浏览了

ChatGPT Vs Google: The Ultimate Comparison Of 2023

Exploring The Power of ChatGPT: A Guide to Conversational AI

ChatGPT: How Much Does It Cost to Build a Chatbot Like Chat GPT?

Introducing ChatGPT 5.0: The Pinnacle of AI-Driven Communication

ChatGPT Vs Google: The Ultimate Comparison Of 2023

DeepSeek Vs ChatGPT: Who Will Lead the Future?

What is ChatGPT? Technology behind ChatGPT

Stop getting ChatGPT and GPT3 confused!

14 essentials and tricks working with text generation AI in ChatGPT

Fine-tuning LLMs for Enterprise