A RAG Prototype
Mark Gerow
Impactful Application Development | Process Automation | Artificial Intelligence | Agile Project Management | Technology Leadership
I recently built a prototype Retrieval Augmented Generation (RAG) application. My specific goals when creating this application where to:
Why create a prototype?
Before starting any project, it's helpful (some might say crucial) to have a clear understanding of the "why" behind it all. Over a year ago my development team began working with OpenAI's GPT LLMs in order to create a chatbot that generates answers based on internal documents and data. While I understood the components and problems to be addressed conceptually, my management duties did not afford the time for direct hands-on development in this area. About a month ago I found myself with the time to dig in deeper, and so set myself this challenge. What I found was an alphabet soup of acronyms and a hodge-podge of technologies. This is not surprising given the explosive growth and evolution of the generative AI field over the past few years. Every example I found seemed to use different tools, data stores, frameworks, and services. Perhaps the most common technology across all the examples I found was Python, an easy-to-use scripting language. While I'm aware that the machine learning and AI communities have widely adopted Python, and is a great language for learning about programming, it is not suitable for enterprise grade application development in my opinion. So, I decided to create my prototype using .NET and C# rather than Python. The choice of programming language does not, however, materially affect any of the concepts discussed below.
Why RAG?
There are many ways to take advantage of generative AI such as that found in OpenAI’s ChatGPT, so why focus on RAG?? I chose RAG because it allows enterprises to combine their internal documents and data with generative AI to help answer questions unique to their business. Using RAG, chatbots can be created for both internal and customer use. Some examples of content that could be surfaced through RAG-enabled chatbots include company policies, product brochures, customer profiles, support FAQs, websites and much more. Prior to generative AI, full text search was used to query unstructured content, but this approach assumes the seeker knows which words to use when searching. RAG, which converts text into numeric vectors (arrays) that represent their meaning rather than their actual words, enables search where the exact words are not known. For example, I can search for "exercise" and find information about "fitness". This makes search much more usable for the seeker, and using generative AI, RAG also can generate natural language replies that summarize the content found, rather than requiring the seeker to read through a large amount of text. In summary, RAG does a better job of both finding and presenting answers based on a body of documents.
The scenario
For this very simple prototype I had ChatGPT create one-paragraph customer profiles for 10 fictitious companies based on cartoon characters. Here's an example:
Daffy Duck Financial Consultants|Known for their unconventional approach to finance, Daffy Duck Financial Consultants is a small yet dynamic firm led by the charismatic Daffy Duck. Specializing in investment strategies and wealth management, this company prides itself on thinking outside the box to deliver results for their clients. While their methods may sometimes raise eyebrows, their track record speaks for itself, with many clients swearing by Daffy Duck's financial prowess and knack for spotting lucrative opportunities.
The idea was to allow seekers to find companies and ask questions about them using RAG and GPT.
The RAG process flow
The prototype needed to encompass the minimum set of processes for a fully functional RAG implementation. Specifically, it needed to:
1. Convert text documents into vector representations
While there are many libraries and services that can do this, I chose to use the OpenAI embedding service. It was easy to test using Postman, and once tested, Postman converted the request into C# code which provided a convenient starting point. Here's an example of an embedding request in Postman.
2. Add the vector representations to a vector store
Once a customer profile is turned into a vector, i.e. an array of numbers representing its meaning, it needs to be stored in a vector database that can later be searched. There are a large and growing number of vector databases out there. I took a superficial look at three: ChromaDb, DataStax, and MongoDb. I chose these three purely because they were referenced in some video tutorials, and my choice should not be construed as a recommendation, although all three of these seem like good choices for a variety of scenarios. ChromaDb is open source, very small, can be run from your desktop, but seemed more like a learning tool than something that might be used in a production environment. MongoDb is one of the heavyweights in the document (NoSql) database arena, and so would likely be a strong contender for an enterprise grade vector store. I settled on DataStax however, which provides a serverless implementation that is easy to configure, along with a robust API. After running my C# load routine my data looked like this in the DataStax console:
3. Prompt the seeker to enter a question
With my vector store loaded, the next step was to input a question to be answered, such as:
This was accomplished by writing a simple .NET Core console application.
领英推荐
4. Convert seeker questions into a vector representation
Arthur C. Clarke wrote: "Any sufficiently advanced technology is indistinguishable from magic". So, here's where I applied another bit of “magic” by converting the question into a vector, using the same OpenAI library used to vectorize the profile documents. Remember that vectors capture the (approximate) meaning of the text they are derived from. Once the question is in vector form, I queried the vector store for the document that most closely matched the question.
5. Search the vector store for the best match to the user's question
The prototype then calls the DataStax service to find the best match in the vector store. The snippet of C# that performs this call looks like this:
6. Format a prompt to send to ChatGPT
Now that the right source document had been found, I needed to format a prompt (i.e., instructions) to send to ChatGPT. This was quite simple; just tell ChatGPT to use the found text rather than its own knowledgebase. The prompt looked something like:
Note the wording. I instructed ChatGPT to answer the question using only the profile for the matching customer found in the vector store. This is no different than if I were to go to ChatGPT online and enter the prompt:
Answer the following question: "Which customer works in the fitness business?" using only the following data: "Popeye's Muscle Supplements | Founded by the spinach-fueled Popeye himself, Popeye's Muscle Supplements is a rising star in the health and wellness industry. Despite its modest size, this company has made waves with its range of all-natural supplements designed to boost strength and endurance. With endorsements from athletes and fitness enthusiasts alike, Popeye's Muscle Supplements is gaining traction as a trusted name in the competitive world of nutritional supplements."
7. Send the prompt to ChatGPT and receive the answer
Next, the prototype called the ChatGPT API programmatically with the above prompt to get its answer.
8. Display the answer
Et voila! We have a nicely worded answer to our question.
Now, you may have noticed that the word "fitness" appears in the customer profile for Popeye's Muscle Supplements, and might be thinking that it was a word match that resulted in this profile being selected. But what if I search using a word that has a similar meaning but does not appear in the profile? For example, the word "exercise" has a similar meaning to fitness, but is not found in the profile:
And what about a more general question, like what are all the services offered by Popeye's business?
This demonstrates the power of combining vector search with GPT. I don't need to program in stock answers, rather GPT handles producing a nicely worded answer to my question. Just to finish things off, here are a few more examples:
It's worth noting that one of the above questions was not answered at all, and another answer was not optimal, which indicates some more work is needed to fine tune the documents in the vector store. Remember that vector search works on probabilities and "nearness" between the question and the documents in the vector store. An important step that was not discussed here is the selection and preparation of documents for inclusion, which will have a significant impact on the quality of search results. Regardless, this simple prototype does fairly well, and demonstrates the major elements of a RAG solution.
Conclusion
This prototype is just a starting point, and I plan to extend it in a number of ways. For example, this version does not retain any context of earlier questions asked in a session, which would allow the seeker to refine their question without having to restate it each time. It also only includes one type of document - the customer profile, but a real-world RAG solution would likely include multiple types of documents such as customer, product, and sales rep profiles. Lastly, this example only includes unstructured data (documents), but many RAG scenarios combine both unstructured and structured data (such as that stored in a SQL database or in Excel).
While this prototype is rudimentary, I hope it has been instructive for both technical and non-technical readers. Software developers can see that while there are many new technologies here, none are more difficult to master than those that came before. Non-technical professionals can begin to imagine how scenario-specific RAG solutions might benefit their areas of expertise. And managers can envision an enterprise where employees have much easier ways to access the ever-growing body of information required to do their work.