登录查看更多内容

A Conversational Agent with a Single Prompt?

Giorgio Robino

Conversational LLM-based Applications Specialist

发布日期: 2024年6月6日

+ 关注

Using Large Language Models for Chatbot Development: Specializing in Prompt Design

In this article, I share my experience in constructing Generative AI prompts to develop Conversational Agents.

First, I will clarify the relevant terms. Then, I will provide a brief overview of how we can utilize Large Language Models (LLMs) as intelligent conversationalists. Finally, I will present some compelling use cases where I have refined prompt engineering best practices to implement chatbots solely from no-code requirement specifications (system prompts).

Conversational Agents

Years ago, during my previous academic career (specifically as an assistant researcher at ITD-CNR), my research leader and other researchers always referred to chatbots as conversational agents. This perplexed me, as I’m particular about terminology in computer science. I always understood an agent to be any kind of software that intermediates among humans to perform some task (usually delivered by a human).

My point was that not every chatbot is truly an agent in functional terms.

For example, consider a voice system that acts as an assistant (nowadays we might call it a voice copilot) for a worker, assisting them in accomplishing specific real-life working tasks. Is it correct to define this system as a conversational agent? Maybe not, because it lacks agentive functionality. The term assistant may be more appropriate for augmented-reality scenarios like this (read also my previous article, Voice-cobots in industry. A case study). However, I admit that historically, in the scientific and academic community, conversational agent and chatbot have been used as synonyms.

Nevertheless, things have become more confusing with recent advancements in LLM-based autonomous agents. In this research area, which is broader than conversational applications, agents can autonomously define and execute micro-tasks based on a human-provided description in natural language (the system prompt) of a specific high-level duty or activity. This is a fascinating area of research with potentially disruptive practical applications, and there are many software frameworks available, but that’s a slightly different topic. Let’s now focus instead on the conversational application verticals.

Overall, I use the term Conversational Agent to refer to a specific type of agent that performs conversational tasks on behalf of a human.

Progress with LLM-based conversational agents allows us to build chat systems with a single prompt based on cognitive architectures. By utilizing advanced state-of-the-art LLMs, developers can describe what the chatbot should do without having to program the conversation as a series of fixed dialog states. From the development perspective, this could be a definitive cost-saving alternative to solutions based on intents, slots, states, and hard-coded flow management.

LLMs as Core Layers for Agent Engines

Long story short, GPT-based Large Language Models have revolutionized the field of conversational AI since the release of GPT-3 by OpenAI. These recent LLMs, trained on vast amounts of text data, can generate human-like responses and engage in meaningful dialogues. Their ability to understand and generate language makes them ideal for building chatbots.

Instruction-based Chat Completion Models

A basic foundation model (a large language model trained with sufficient data to 'know' a specific human language) is not sufficient to be a valid engine, capable of making conversations and reasoning.

Simply put, the disruptive improvement in GPT-3 models occurred with GPT-3.5-turbo (the model behind the famous ChatGPT, see my previous article: Reflecting on ChatGPT’s Anniversary). GPT-3.5-turbo is based on the foundation of GPT-3 but enhanced by a supervised training algorithm (HFRL and similar supervised training mechanisms) that enables it to converse with people in a fluid natural language, using polite and 'controlled' manner.

More importantly, the models from GPT-3.5 onwards are also instruction-based models because they are trained with programming code (OpenAI coined the term instruct). This last feature enabled some sort of 'reasoning' abilities. The LLMs are now able to perform themself some programmatic 'logic' understanding, such as concepts of programming languages including sequences, conditionals, and iterations.

Function Calling Feature

Another disruptive feature that nearly all state-of-the-art generative models now possess is the ability to call external functions/APIs (sometime called tools in LLM agents jargon). This is achieved through special fine-tuning of the aforementioned models, enabling LLMs to 'call' external functionalities, such as programs made in any programming language, to solve specific requests or actions and retrieving real-time data. This is a fundamental need in a cognitive architecture, where the LLM is the core 'reasoning' component that autonomously retrieves information from external systems or invokes actuators.

The function-calling feature is crucial for autonomous agents but not essential for building basic conversational agents. However, function-calling becomes a must-have when the conversational system needs to invoke external APIs. For example, a customer care assistant might need to open a ticket in an internal ticketing system or query the system to monitor the ticket status and inform the customer during the conversation.

The recent generative language models (instruction-based LLMs fine-tuned for chat completions, also enabled by function calling) can understand logic and instructions (through directive written in natural language in the prompt) and have an improved capacity to conduct human-like conversations in nearly any natural language. Additionally, these models can interact with external (proprietary) APIs. All in all, today’s models like GPT-4 or equivalents are viable engines for building autonomous agents capable of performing task-oriented conversations typically handled by humans.

In the next paragraphs, I will delve into this with some examples, but first, I will introduce the prompt engineering approach I used.

Prompt Design for Task-oriented Conversations

Prompt engineering is the practice of designing and refining input prompts to effectively guide the behavior and output of language models. By carefully crafting these prompts, users can enhance the model’s ability to understand and respond to complex instructions, ensuring more accurate and contextually appropriate outputs. This technique is crucial for optimizing the performance of state-of-the-art generative models, enabling them to perform specific tasks, generate creative content, and simulate 'human-like' conversations with precision.

The techniques I experimented with are about writing system prompts to instruct the LLM to conduct conversations in specific application domains to achieve particular tasks.

In-Context Learning

In all use cases I’ll introduce, I used a similar approach: the system prompt is composed of an introductory context section where I defined 1: the goal of the conversation (or task), 2: the bot-persona (the description of the agent’s characteristics/character, using the usual conversation design metrics), 3: the user persona (a description of the user profile), 4: The core part of the context is contextual data useful for the current conversation session. For example, if the conversation is an interview for a job applicant, this data includes the job description file and the candidate’s curriculum. More generally, the technique is akin to the one made famous with Retrieval Augmented Generation (RAG) applications, where you 'stuff' inside the prompt data retrieved (maybe with some embeddings database or any specific vertical data retrieval system).

When considering the data needed to accomplish a task-oriented conversation, it could be anything that fits into the prompt context window size (4K tokens, 8K tokens, 16K tokens, and so on). In all practical use cases I have experimented with and mentioned below, a context window of 4–7K tokens has been entirely sufficient for the purpose.

Directive Instructions on Conducting the Dialog

After the context part of the prompt, in the following instruction section, I detailed the required steps (actions to be accomplished in a specific order). This is the tricky part where you instruct the model not just on how to conduct the conversation in terms of social practices and human conventions, but also provide guidelines regarding the topics to cover, possibly including explicit questions or general behaviors to adopt.

领英推荐

Evolution of Chatbot Development: Using Large Language…

Mantra Labs 1 年前

How to get started in Conversational AI

Kane Simms 12 个月前

Approach to chatbots: from classic solutions through…

Sollers Consulting 1 个月前

Here, you instruct the LLM on what topics must be covered in the chat, how to conduct the dialogue with more or fewer guidelines, and how to guide the conversation from point A to the desired point B. Finally, the instructions must include criteria for deciding when to end the conversation session, which depends on the specific application and can be a bit tricky to implement.

Some Application Use cases

I introduce three dialogue systems I prototyped for entirely different verticals. All these applications have in common the fact that I wrote the conversation program as a single system prompt for an LLM. In chronological order of my developments:

Case 1: A Virtual Caregiver for Patient Telemedicine Visits

I have been involved in some prototypes related to the healthcare vertical, specifically in transcribing and extracting data from practitioner-patient visits for Conversational Analysis (CA) using LLMs. As a side project, I developed an emulation of a remote monitoring visit where a virtual assistant (acting as a practitioner or caregiver) contacts a patient every day via an instant messaging app to monitor their health status, particularly considering the patient is potentially affected by COVID-19. The virtual caregiver asks the patient about their health status, chats with them in a very natural way, delves into symptoms, and engages in small talk if the patient initiates it, while keeping the conversation focused on retrieving certain parameters: health status, temperature, blood oxygenation, and a few other variables.

Once all the requested information is retrieved, the virtual caregiver says goodbye to the patient and closes the conversation, internally returning a data structure (a JSON) containing all the information obtained from the patient. Interestingly, in this case, the end of the conversation is not strictly necessary. After the initial session, the user can re-engage with updates on their symptoms. The virtual caregiver replies to any patient questions or statements about their symptoms and internally emits any data updates via a function call. This example is also interesting for its psychological support aspects, but that’s another story.

You could argue that the described conversation is just an old-fashioned form-filling that one could implement with a simple hard-coded chatbot. However, the novelty of the LLM-based conversation is the naturalness of the interaction. This variance in how the system conducts any new conversation session, allowing user digressions while returning to the programmed goal of gathering information, is invaluable!

Case 2: A Customer Care Assistant

This is a classic chatbot application that I already mentioned in the article. Imagine a virtual assistant helping an employee of a very large company submit requests or report issues that can be tracked by opening tickets on a specific backend system. The user must also be able to ask about the status of previously submitted tickets. This is a very common chatbot application that I delivered in production as a standard state-machine flow tool seamlessly integrated with external REST APIs.

Subsequently, I tried to re-implement the same application using an LLM-based approach. The initial application involved highly constrained workflows, so what’s the advantage of using an LLM as a dialog conductor? I also struggled with implementing these programmatic steps that are simple to implement with a hard-coded flow. So, what are the advantages of implementing all this logic with a 'declarative' approach instead of using a standard software program?

There are two interesting pros: the conversation built by the LLM seems more 'natural', emulating the behavior of a human being (e.g. a help desk operator), allowing the user to describe an issue in various ways and guiding them to explain the problem concisely to gather all the necessary data.

The second advantage is the reduction in development time: with the single-prompt approach, the chatbot developer is no longer a software programmer using a chatbot development tool, but rather a prompt engineer with conversational design skills, who writes the chatbot specification as a special text in a natural language (English, Italian, etc.).

Besides the prompt engineer, we still need a backend developer who knows how to integrate external APIs, but what’s nice is that these two roles are quite distinct, and the software responsibility boundaries are clear.

Case 3: A Virtual Job Position Interviewer

The most fun and intriguing application I’m experimenting with is in the Human Resources vertical. Using the usual in-context learning prompt-writing approach, I built an emulator of a recruiter conducting an interview with a person who applied for a certain job position.

In the prompt context, I included the job post description and the candidate’s curriculum vitae. In the instruction section, I taught the LLM to act as a perfect recruiter, asking questions to verify all matches and mismatches between the role description and the candidate’s experience. The results are very impressive, and the virtual interviewer’s behavior is smart enough to detect weaknesses and strengths of the candidate by comparing the CV with the required skills. As a test candidate myself, I have been unable to lie in response to such precise investigative questions.

Since I’m not an expert recruiter myself, my approach could surely be improved with input from a domain expert in human resource recruiting. Nevertheless, my current experiments are astonishing. The system conducts a natural (similar to a human-to-human dialog) yet very rational interview, exploring points of weakness and verifying the truth of user statements in a polite and positive manner (as I instructed the bot-persona to do).

Besides the above application, I also created collateral LLM-based tools, such as a 'pre-interview' prompt to decide if a candidate deserves to be interviewed and some 'post-interview' tools that analyze the interview dialog and produce a structured report with a final ranking, but these are collateral (one-shot) LLM-based applications.

Prompt Development Challenges

LLMs are not deterministic. This has certain advantages, such as enabling smooth, fluent, always slightly different conversation variations, but it also presents some drawbacks. When considering the applications covered here, this randomness can potentially create issues. The main challenge I encountered was not the outcomes of the first prompt I designed, but the subsequent editing required to refine it to adjust some incorrect or unexpected runtime behavior.

Related to this, LLMs suffer from what I call fragility syndrome: you may have an initially well-functioning prompt, but even a minor, seemingly insignificant modification of a statement or a typo (by the way, typos are absolutely forbidden when writing prompts; please use a spell checker!) can cause different and unexpected runtime behaviors. Fixing this usually requires a lot of time spent on trial and error, where I rethink the prompt and often have to rewrite or reorganize it following a new, more logical approach.

For the prototypes I created, I admit I did not use any automated testing tools to validate the LLM outputs. This automatic evaluation is not a trivial task, although there are some emerging tools that can help prompt engineers validate prompts (this is a topic for a future article).

Tentative Conclusions

There is a lot of hype around optimal use cases for LLMs. Since 2023, I have seen hundreds of papers, articles and videos concerning RAG/LLM applications. While LLM-enabled data retrieval-based chatbots are certainly an important use case, for me, as a conversation designer also, the perfect use case for state-of-the-art LLMs is to exploit the conversational capabilities embedded in LLMs trained and fine-tuned on human conversations.

The no-code dream of developing chatbots has now become a reality with just prompt engineering skills?

What are your thoughts?

#promptEngineering #LLMs #generativeAI #GenAI #nocode #conversationalAgents #AutonomousAgents #chatbots #ConversationDesign #AI #MachineLearning #NaturalLanguageProcessing #AIChatbots #AIApplications

Giorgio Robino

Conversational LLM-based Applications Specialist

1 周

I’ve noticed continued interest in this post and the old article (~6,000 impressions)—thank you for your engagement! For those curious to explore further, I invite you to check out my recent preprint on arXiv, which dives deeper into the concept of building conversational agentic systems. The work is still in progress, and I plan to integrate an evaluation technique called "LLM-as-a-Judge," leveraging large language models to assess the quality of other models' responses as a scalable alternative to human evaluation. Could these systems eventually develop, run, test, and refine themselves—almost autonomously?! :) https://arxiv.org/abs/2501.11613

1 次回应

Sandra Miguel Juan

AI Conversation Specialist | Prompt Engineer & AI Engineer | Designer & Computational Linguist | LLMs & Chatbots | AI Agents

8 个月

I find this article really interesting and inspirational Giorgio Robino! For sure, I’m going to use it to let my junior colleagues have all these relevant insights that you have outlined superbly in mind when designing a prompt for conversational AI. Congrats ????????

1 次回应

Georg Huettenegger

8 个月

Thanks for sharing. We will have to see where the dust settles in one or two years - with rapid advancements in what models can do while consistency can still be an issue.

1 次回应

Lorenza Saettone

Filosofa specializzata in Epistemologia e Cognitivismo, PhD Student in Robotics and Intelligent Machines for Healthcare and Wellness of Persons

8 个月

è un’area in espansione. L’ho implementata dentro al mio software riabilitativo. Mi rendo conto che dietro alla logica del prompt servono umanisti. Non lo dico per tirare acqua al mio mulino, ma sinceramente mi sono resa conto aprendo la macchina e rimontandola ai miei scopi, come la conoscenza metodica, profonda della lingua (specialmente lingusti, filosofi, letterati) abbia fatto la differenza per ottenere dal LLM ciò che volevo. Con un solo prompt ovviamente. Non è ingegneria del prompting, bensì facoltà di lettere e filosofia del prompting.

Alec Lazarescu

VP Engineering | Generative AI | Investor

8 个月

Great read Giorgio Robino I posted a longer response on LLMs as judges/critics on the horizon: https://www.dhirubhai.net/posts/aleclazarescu_nice-walkthrough-on-where-weve-been-and-activity-7216153456791703552-2MvT?utm_source=share&utm_medium=member_desktop

1 次回应

查看更多评论

要查看或添加评论，请登录

Giorgio Robino的更多文章

Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems

2025年1月9日

Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems

Abstract This study introduces Conversation Routines (CR), a structured prompt engineering framework for developing…

3 条评论
SWARMing Conversational AI

2024年10月16日

SWARMing Conversational AI

Integrating No-Code and Code in Agent-Based Workflows A few days ago, the just released SWARM open-source project [1]…

13 条评论
Testing the Language Proficiency of Popular?LLMs

2024年7月16日

Testing the Language Proficiency of Popular?LLMs

A Semi-Serious LLM Self-Evaluation Experiment Last weekend, just for personal fun, I conducted a non-scientific…

2 条评论
Reflecting on ChatGPT's Anniversary

2023年11月29日

Reflecting on ChatGPT's Anniversary

What is the path forward after a year of revolutionary strides in conversational AI? ChatGPT celebrates its first…
Non-English Languages Prompt Engineering Trade-offs

2023年9月5日

Non-English Languages Prompt Engineering Trade-offs

To employ or not to employ the language of English, this is the question. English stands as the most widely utilized…

3 条评论
Whither Almond, the Stanford University open virtual assistant, will go?

2021年1月20日

Whither Almond, the Stanford University open virtual assistant, will go?

Interview with Giovanni Campagna, one of the Almond principal developers…
Voice-cobots in industry. A case study

2021年1月18日

Voice-cobots in industry. A case study

A voice assistant application in the shipping container industry…
Google Assistant and Spotify: please STOP THE SPOTS!

2019年2月23日

Google Assistant and Spotify: please STOP THE SPOTS!

For days, when listening music (through a Spotify free account) on my Google Home, music is random-interrupted by…
Amazon Echo VS Google Home. Who wins? Ep. 8: Language Translation weirdness!

2019年2月15日

Amazon Echo VS Google Home. Who wins? Ep. 8: Language Translation weirdness!

Testing with a Google Home Mini Google Home bizarre 'singularity': that the word 'Google' is forbidden in any language…

1 条评论
AmazonEcho VS GoogleHome: Who wins? Ep. 7: Language Translations Tests

2019年2月11日

AmazonEcho VS GoogleHome: Who wins? Ep. 7: Language Translations Tests

Some tests translating from Italian to English language. 1.

See all articles

A Conversational Agent with a Single Prompt?

Giorgio Robino

Conversational LLM-based Applications Specialist

Using Large Language Models for Chatbot Development: Specializing in Prompt Design

Conversational Agents

LLMs as Core Layers for Agent Engines

Instruction-based Chat Completion Models

Function Calling Feature

Prompt Design for Task-oriented Conversations

In-Context Learning

Directive Instructions on Conducting the Dialog

领英推荐

Some Application Use cases

Case 1: A Virtual Caregiver for Patient Telemedicine Visits

Case 2: A Customer Care Assistant

Case 3: A Virtual Job Position Interviewer

Prompt Development Challenges

Tentative Conclusions

Giorgio Robino的更多文章

社区洞察

其他会员也浏览了

Conversational AI Experience

Mastering Conversational AI: Advancements in AI-Powered Customer Service

The Unseen Conversation: How Our Daily Chats with AI Chatbots Are Fueling the Future of Conversational AI

Harnessing Out-of-the-Box LLMs for Custom Chatbots

What is the Difference Between Conversational AI & Chatbots?

Revolutionizing Communication: Unleashing the Power of Conversational AI for Seamless Human-Machine Interaction!

Real-time speech-to-speech AI: The next step in conversational tech

The Future of Conversational AI: Trends Shaping 2025 and Beyond

Building LLM-Powered Chatbots with Controlled Conversational Flow: A Comprehensive Guide

Using Large Language Models for Chatbot Development: Specializing in Prompt Design

Conversational Agents

LLMs as Core Layers for Agent Engines

Instruction-based Chat Completion Models

Function Calling Feature

Prompt Design for Task-oriented Conversations

In-Context Learning

Directive Instructions on Conducting the Dialog

领英推荐

Some Application Use cases

Case 1: A Virtual Caregiver for Patient Telemedicine Visits

Case 2: A Customer Care Assistant

Case 3: A Virtual Job Position Interviewer

Prompt Development Challenges

Tentative Conclusions

Giorgio Robino的更多文章

Conversation Routines: A Prompt Engineering Framework for Task-Oriented Dialog Systems

SWARMing Conversational AI

Testing the Language Proficiency of Popular?LLMs

Reflecting on ChatGPT's Anniversary

Non-English Languages Prompt Engineering Trade-offs

Whither Almond, the Stanford University open virtual assistant, will go?

Voice-cobots in industry. A case study

Google Assistant and Spotify: please STOP THE SPOTS!

Amazon Echo VS Google Home. Who wins? Ep. 8: Language Translation weirdness!

AmazonEcho VS GoogleHome: Who wins? Ep. 7: Language Translations Tests

社区洞察

其他会员也浏览了

Conversational AI Experience

Mastering Conversational AI: Advancements in AI-Powered Customer Service

The Unseen Conversation: How Our Daily Chats with AI Chatbots Are Fueling the Future of Conversational AI

Harnessing Out-of-the-Box LLMs for Custom Chatbots

What is the Difference Between Conversational AI & Chatbots?

Revolutionizing Communication: Unleashing the Power of Conversational AI for Seamless Human-Machine Interaction!

Real-time speech-to-speech AI: The next step in conversational tech

The Future of Conversational AI: Trends Shaping 2025 and Beyond

Building LLM-Powered Chatbots with Controlled Conversational Flow: A Comprehensive Guide