Chat With Your Data: Reinventing Chatbots Through AI

Chat With Your Data: Reinventing Chatbots Through AI

How to create an AI chatbot that can talk to your clients in the most natural, coherent, and correct way, dealing efficiently with unexpected questions and provocations? Konstantin Grusha, Delivery Manager, shared with us how our team crafted a working PoC in just two weeks and developed a new way to adapt a Large Language Model (LLM) to DataArt’s marketing requirements.

Many of our clients want a chatbot for marketing purposes, sales, internal digital assistants, or help desks – we at DataArt also wanted one for our own website.

But the common modern chatbots are simple scripted bots that can answer typical questions with pre-recorded answers. Nothing interesting.

We wanted to combine business needs and some special effects. So, we decided to make an AI chatbot that can answer questions about DataArt using the information we feed it. You can chat with our bot and test it at dataart.com.

In this article, we will present our approach, explain the limitations of Large Language Models, and explore strategies to overcome them.

The Goldfish Memory

Large Language Models are the hottest topic now. It is almost like teenagers and dating: everybody wants it, no one has really done it, yet everyone claims to be an expert.

With all the hype surrounding artificial intelligence, we should remember that it is still quite unfair to call LLMs intelligent—at least in the way we might expect. They are not thinking yet–they depend entirely on the context we give them, can only operate with the provided information, and cannot produce new ideas.

The current cutoff date for all the models is somewhere in 2021. The models do not know about the last two years’ events. Since DataArt has been around for 26 years, they have some information about the company. But we needed not only to give our chat more relevant and updated information but also ensure it could tell it correctly.

Moreover, these models are like goldfish – they have a short operational memory. A standard model interaction capability is limited to roughly 8,000 tokens (about 20 pages А4), which in data terms would be something like 30 kilobytes of English text (the amount differs a bit for different languages with higher or lower information density). More expensive models, like, for example, GPT-4-32K, allow you to put in 150 kilobytes of data in one conversation (like a large book). So, if you are chatting with the model for a long time, at a certain point, it will start to forget what happened at the beginning of the conversation.

We needed to teach new stuff to our model somehow. However, a question asked to a GPT-4-32K model can cost you up to $5. So, we needed to improve the quality and the cost ratio.

There are many ways to skin the cat in this game, and vector databases are one of them.

Vector Databases and Long-term Memory

When you talk to ChatGPT or an OpenAI Playground model, the only knowledge it has is that it has been pre-trained. We are now able to train/fine-tune GPT-3.5 — this option was released in late July 2023. Still, the training does not add more knowledge to the model. It allows us to shape the model's output, but our tests have shown that it is not worth the effort now.

So, we need to adapt a pre-trained model to solve our problems. This new discipline for adapting models is called Prompt Engineering. The field is less than a year old, and the speed at which this is evolving is cosmic.

So, the technique we used to adapt our model did not exist when we started developing the bot half a year ago–we invented it on the fly. Today, it is called RAG (Retrieval Augmented Generation) – a generation that is augmented with memory.

And this is where vector databases come into play.

Vector databases do not hold the specific text as is – they can convert text into abstract concepts and search for something relevant as if it were your own local search engine.

Any concept in vector databases is a vector in a multidimensional context space. There is an infinite number of contexts. The more vectors you have, the better you sort out the sector where you need to look for an answer.

Arithmetic operations are possible with these vectors - they can be added, multiplied, divided, and so on. For example, if you take the vector of the concept "queen" and subtract from it the vector of the concept "woman," you will get "king." But this is a digression.

The main point here is that our model's vector database acts as a long-term memory.

The process of converting your documentation to be stored in a vector database is called embedding.

Feeding the Model with Relevant Data

First, we needed to figure out what information we could feed the bot – something reliable, relevant, and clean from the NDA point of view.

The solution was our Slides Library. At DataArt, we keep all the PowerPoint slides used in sales in one place – more than a thousand slides with open information about us, anonymized use cases, etc.

Our concern was that the slides were full of bullet points and incomplete sentences with no verbs. People do not speak this way. So, we were surprised that the bot could digest the given info and give back quite human answers in everyday language.

We have used almost 1,200 documents and processing them took about five minutes.

SharePoint, Confluence, private or public websites, and Jira, different databases – all can be used as sources for your vector database. The one thing to remember is that the model only understands texts and ignores pictures, tables, graphs, videos, etc. So, for example, slides with logos should be changed into a text list of companies (the paid version of ChatGPT Plus can analyze images, although not all images are currently supported).

Testing Different Models

We have tested our vector database on different models. Most models are free/MIT-licensed. The one that gave us the best relevancy results was Sentence-transformers/all-mpnet-base-v2.

Here are several more for your consideration, but there are thousands of models, most of which are free or not too expensive.

Minding the Privacy

While you can keep your search engine locally on your machine and embed your local documents, to actually talk to your data and generate something that makes sense, you must send some data to OpenAI or Azure API. So, you must always consider the security level of your data.

If you want to analyze open information — you can go for OpenAI models. For anything more sensitive, you would better use Microsoft Azure.

Safeguards and Rubber Ducks

After the embedding, the model is ready to engage in conversations, using the base as a source for relevant data.

This is how it goes: a user asks a question, and the system looks up the five most relevant documents from the vector database and uses them to form a system prompt. Note that the system prompts hold more weight than the conversation itself. Once the system prompt is shaped, the dialogue is sent to OpenAI or Microsoft Azure, generating the assistant reply.

Whenever we ask the model something, we give it a new iteration of the prompt. With every new question, our system message evolves, and we provide new relevant pieces of context. And you keep talking until your curiosity has been satisfied.

That is the theory, but users are unpredictable. They can ask unexpected questions or act provocatively. Sometimes, users can try to trick the model into revealing secret information.

This is why we need to fine-tune the system prompt A LOT, to put safeguards against what the users might ask. We also need to groom our data.

To do it, we continuously ask the bot questions taken from the library of questions we prepared in advance, like: "How many rubber ducks does it take to optimize your programs?" Or "How many people are employed at DataArt?"

We compare the outputs of different models. GPT-3.5 usually answers quickly, but it is not the best quality. While GPT-4 usually gives a more natural-sounding answer. Then, we adjust the system prompt to keep the damage minimal.

Personality and Blood Written Rules

We also give the bot a personality–this is what defines it, and this is where you install the safeguards.

There is a saying that security and compliance rules are written in blood. The system prompts are also written in blood. The conversations with bots must be logged and periodically checked so that the system prompts can be adapted and improved–to prevent improper answers. So we gave our bot personality, which included instructions on how to answer if it doesn't know the answer, the right tone of voice, and even our company's stance on the latest events worldwide. and other crucial information.?

Whenever the answers become too repetitive, or some words come up too often, we always come back and refine the prompt as we go.

Raising the Bar: Further Improvement

Here are several more tips and tricks for adopting an LLM for different chatbots:

  • Split data into thematic silos for efficient embedding. Create multiple collections within the database, categorized into different topics. This way, if a user inquires about our design capabilities and Azure, we can retrieve data from these specific sections, making the output more relevant.
  • Retain reduced messages to emulate short-term memory. If you want to hold a very long conversation with your model, you do not have to store all the answers internally. You can ask the model to compress text, keeping the same meaning. That will allow for longer dialogues.
  • Extend the context window to have longer conversations.
  • Enable scenarios that include linking to sources, live reps, ticketing, emails, etc. For a sales bot, you might want, at some point, to give control to a human operator or permit the bot to give links to relevant case studies. For an internal bot, you might want it to be able to provide links to Jira tickets or Confluence articles. It is all possible.
  • Make the bot ask clarifying questions. Another great feature (we have added it to our internal helpdesk bot) is the ability to ask the user questions if the bot lacks information. It combines an old, scripted bot and a new AI bot. When a person asks a question, the bot realizes that to answer, it needs data to fill in specific fields (location, age, number of days on holiday, for example). If some data is missing, it queries about that.
  • Use ElasticSearch instead of qdrant. Qdrant is open source, and you do not have to send your data anywhere; it is all local. It is suitable for PoC development purposes. But for a more complicated system, go for ElasticSearch.
  • Alternate between cheap and expensive models for different tasks. GPT-3.5 is about ten times cheaper than GPT-4, and GPT-4-32K is the most expensive model, costing ten times more than GPT-4.
  • Remember the magic word: langchain If you really want to be on the edge of AI and know what capabilities to explore when developing something yourself, remember the magic word: LANGCHAIN. It is the library that gets updated almost every day. These guys are experimenting on the leading edge of this technology.

Originally published here.

要查看或添加评论,请登录

Andrew Mazur的更多文章

社区洞察

其他会员也浏览了