Creating Conversational AI with User-Centric Mindset: A Step-by-Step Guide with ChatGPT-4 ??????(??)
So in this article, we talk about how to create a chatbot that responds in the way that you desire using ChatGPT-4, the latest GPT release by OpenAI — setting the architecture, knowledge retrieval, prompt design, and engineering. You will see how setting certain parameters of the API and/or prompt engineering can make a difference in the tone, personality, and voice of the response. Let’s begin!
Step 1: Picking the right model (GPT-4)
Note: Initially we built the chatbot using GPT-3.5, but we updated it by using GPT-4 — the following is to show how you can go about choosing what model to use:
First things first, it is time to find the right GPT model to use for the chatbot. Out of the 5 latest GPT-3.5 models (the most recent version out at the time of development), we decided on gpt-3.5-turbo model for the following reasons:
Step 2: Setting the Right Architecture
Now that we picked the API key, it’s time to set the architecture. Let’s take a step back and think of the goal of the chatbot — even though our user is chatting with a non-human to get answers, we want to mimic a human conversation. To create a natural conversation that fulfills this goal, let’s answer three main questions to figure out what variables we are optimizing for:
1. What enables someone to answer a question well?
2. Then, what makes up someone’s response?
3. Outside of that, how can we enable the architecture to work efficiently at scale?
The answers to the three questions above now can be translated to create the following architecture that consists of three main modules:
3. Prompt engineering: this is where we craft the prompt to ChatGPT to get the quality response we want, this is comprised of:
Now that we have a high-level overview of how the modules are set up, let’s dive deeper into their implementation!
Step 3: Prompt Engineering
Prompt engineering consists of sending a combination of:
to ChatGPT via the API.
The temperature is a variable of the ChatGPT API key that we can easily set. I’m going to divide this Step 3 into two main sections:
1) Setting the temperature
Here is the very first version of Bill-d, but they don’t sound much different than going to read the document itself. This is because the temperature which determines the amount of randomness and creativity of the model’s responses was set to zero. This means that the model will always select the highest probability word to complete the next word, which will always be a word from the knowledge, i.e. documents.
Tuning temperature leads to more variation, randomness, and creativity. However, a very high value would increase the risk of “hallucination” which means an unpredicted output like being off-topic or generating repeated words and gibberish that doesn’t make sense grammatically. Here, we see that the response with temperature = 1 is a sentence full of words that do not relate to each other; randomness at its highest.
Trying out the different values of temperatures, we see an optimal performance with a temperature set at 0.3 giving the model a little bit of flexibility to actually respond as a language model, not just a knowledge search bot, and a bit of sanity to respond with proper words and sentence structure.
As you can see, the temperature can make a big difference in the quality of the response a user receives by adding its interpretation and elaboration, so I recommend that you play around with the intensity of temperature to see what fits your end-user’s needs the best.
2) The 4 Experiments
The purpose of this exercise is to show how different combinations of engineered prompts, knowledge, previous chat, and setting a specific tone impact the overall voice of the response. Here, we create four experiments:
Experiment #1: Sending all relevant domain knowledge with a baseline engineered prompt and previous chat
Experiment #2: Only send the most relevant knowledge by weighing existing knowledge based on similarities with the user’s prompt
领英推荐
Experiment #3: No guided engineered prompt but send all relevant domain knowledge and previous chat
Experiment #4: Same as experiment #1 but talk in the tone of a Pirate
For this set of experiments, we ask each experiment the same question for control:
What are some challenges that I should foresee as I join Bld.ai as a new product manager?
Let’s see what the response looks like for each of the experiments and compare to see what we want for Bill-d!
Experiment #1: Baseline vs Experiment #2: Weighted Retrievals
For both experiments 1 and 2, we keep the engineered prompt the same; we guide ChatGPT to answer only with the knowledge given by the internal documents, etc. What’s different between the two experiments is the knowledge we use to answer the user’s questions.
In Experiment 1 , we feed all the retrieved information from the internal documents to ChatGPT to generate the response with its interpretation. Versus in Experiment 2, we only feed the retrieved information that we think is the most relevant to the question asked by the user by weighting information based on semantic search. This means that ChatGPT won’t be as informed but will give more focused and specified answers.
When comparing the responses by Experiment 1: Baseline and Experiment 2: Weighted retrievals, you can see that the response from Experiment 2 is more focused on the context and knowledge about Bld.ai vs Experiment 1 is more general on the knowledge.
Experiment #3: Unguided
As for Experiment #3: Unguided, we provide knowledge and a query without a proper prompt/guide - equivalent to how ChatGPT would work if you give it a document and ask about it, meaning that the response will be more general and less focused in the context of Bld.ai.
As you can see, compared to Experiment #1: Baseline and Experiment #2: Weighted Retrievals, the response in Experiment 3: Unguided gives more of a general list of challenges that PMs face, but is much less explained in the context of Bld.ai and its processes.
Experiment #4: Adding voice!
Lastly, this Experiment 4: Pirate is created to show what it’s like to add voice to the response. Here, we guide the prompt to speak like a pirate on top of our Experiment 1: Baseline experiment. It’s a lot shorter than Experiment 1 and you can see a drastic difference in its personality - it is less organized and elaborate but tells you the right steps to take.
The tone and voice of any language are important in conveying any information in a conversation — although this pirate is sassy, imagine asking GPT to speak like a five-year-old for children interacting with a gaming product or Elon Musk for Tesla employees to ask any questions about the company.
From the four experiments, we see that the tone, personality, depth, and breadth of the responses are different based on the values of the temperature, guideline, knowledge, and voice. How to go about choosing which experiment to deploy to the users can be done based on the hypothesis of what’s most impactful to the user or experimenting and gathering feedback from users — and using data to roll out the winning experiment to general.
Step 4: Setting up knowledge retrieval enables efficiency and scalability
Now that we have explored what we want for Bill-d’s voice and personality when responding, it’s time to think about how we can make the implementation scalable so that as the number of documents grows to expand the knowledge set, the speed remains constant.
Overall, it’s crucial that we set up a semantic search so that we only retrieve the information that Bill-d needs to answer the user’s question instead of going through every document including irrelevant knowledge. This will save time and costs especially as the number of documents grows.
Secondly, it is important to note that when trying to use the same architecture for large documents or when connecting it to a large knowledge base of questions, it is crucial to have a fast vector document store like Zilliz or Milvus to store the embedding of those documents rather than relying on the I/O of the system reading from CSV files or similar solutions.
Step 5: Creating an optimal conversation experience with ChatGPT
Lastly, when designing a chatbot with ChatGPT, user experience is an essential consideration. Because ChatGPT is a language model that generates responses based on input, the chatbot’s responses can sometimes be unpredictable or unrelated to the user’s query. We can see that this can be a frustrating experience because it’s hard to understand why the response is the way it is, so we add a feature where users can see what knowledge from the internal document it used to derive the response from, which help the user understand how the response is generated by Bill-d.
To ensure a positive user experience and drive continuous engagement, we should not only consider the chatbot’s tone and personality, but we should also care about the user interface and how users interact with the product. By designing an experience that is intuitive, responsive, and user-friendly, businesses can improve customer satisfaction, increase engagement, and build stronger relationships with their customers.
Takeaways
Wrapping it all up, building a chatbot with OpenAI’s ChatGPT-4 was incredibly fast, but to have a great response and outcome, it is crucial to consider user experience when designing a chatbot with ChatGPT.
When talking about designing user experience with ChatGPT, this includes not only interface design but prompt design and engineering — it’s a science of its own. To ensure a positive user experience, builders like engineers, designers, and product managers should consider the chatbot’s tone and personality, the user interface, and the chatbot’s ability to understand and respond to user queries. Despite the challenges involved in designing an effective chatbot, ChatGPT offers tremendous opportunities for businesses to automate customer service, personalize customer interactions, and improve communication with customers and partners around the world. By leveraging the power of ChatGPT and prioritizing user experience, businesses can build chatbots that improve customer satisfaction, increase engagement, and drive business growth.
Since working on this prototype, OpenAI has released plug-ins to make it even more accessible and efficient for companies to adopt Generative AI in their product such as the retrieval plug-in. With these advancements, businesses can expect to build chatbots that are even more effective and user-friendly.
So what does this all mean when building a solution with ChatGPT?
In the end, success is defined by the end-user’s satisfaction with the response they get from the product. As builders, we can do a variety of experiments to gather feedback from users to determine or even personalize the type of prompt-engineered responses they get. What’s most important is understanding that based on what engineered prompt, previous chat, the knowledge we send, what kind of responses will be outputted, and having experts like engineers, and product managers who can make informed decisions to make the best type of response the end-user needs.
some modifications are made by ChatGPT-4
Credit: Original article published by Jee Soo (Eunice) Choi at https://medium.com/@eunice.choi/designing-a-chatbot-with-chatgpt-79afd818cbff?
The bld.ai team behind this - Amazing Engineers; Ahmed Mohamedeen, Mohamed Alaa for building this with user-centric mindset, and Danny Castonguay, Andres Desantes, Liam Hough, Kenny Wei?for review
Enterprise Transformation and Delivery Leadership
1 年Brilliant piece. Tons of important takeaways. Thanks for sharing.
Helping Companies Improve | Reliability | Operations | Maintenance | Engineering
1 年Thanks for sharing what is behind the curtains.