A RAG Prototype

A RAG Prototype

I recently built a prototype Retrieval Augmented Generation (RAG) application. My specific goals when creating this application where to:

  1. Better understand the RAG process flow, in order to
  2. Demonstrate all the necessary steps required of a working RAG solution,?
  3. Using an enterprise-grade programming language,?
  4. To serve as a basis for building more sophisticated RAG solutions

Why create a prototype?

Before starting any project, it's helpful (some might say crucial) to have a clear understanding of the "why" behind it all. Over a year ago my development team began working with OpenAI's GPT LLMs in order to create a chatbot that generates answers based on internal documents and data. While I understood the components and problems to be addressed conceptually, my management duties did not afford the time for direct hands-on development in this area. About a month ago I found myself with the time to dig in deeper, and so set myself this challenge. What I found was an alphabet soup of acronyms and a hodge-podge of technologies. This is not surprising given the explosive growth and evolution of the generative AI field over the past few years. Every example I found seemed to use different tools, data stores, frameworks, and services. Perhaps the most common technology across all the examples I found was Python, an easy-to-use scripting language. While I'm aware that the machine learning and AI communities have widely adopted Python, and is a great language for learning about programming, it is not suitable for enterprise grade application development in my opinion. So, I decided to create my prototype using .NET and C# rather than Python. The choice of programming language does not, however, materially affect any of the concepts discussed below.

Why RAG?

There are many ways to take advantage of generative AI such as that found in OpenAI’s ChatGPT, so why focus on RAG?? I chose RAG because it allows enterprises to combine their internal documents and data with generative AI to help answer questions unique to their business. Using RAG, chatbots can be created for both internal and customer use. Some examples of content that could be surfaced through RAG-enabled chatbots include company policies, product brochures, customer profiles, support FAQs, websites and much more. Prior to generative AI, full text search was used to query unstructured content, but this approach assumes the seeker knows which words to use when searching. RAG, which converts text into numeric vectors (arrays) that represent their meaning rather than their actual words, enables search where the exact words are not known. For example, I can search for "exercise" and find information about "fitness". This makes search much more usable for the seeker, and using generative AI, RAG also can generate natural language replies that summarize the content found, rather than requiring the seeker to read through a large amount of text. In summary, RAG does a better job of both finding and presenting answers based on a body of documents.

The scenario

For this very simple prototype I had ChatGPT create one-paragraph customer profiles for 10 fictitious companies based on cartoon characters. Here's an example:

Daffy Duck Financial Consultants|Known for their unconventional approach to finance, Daffy Duck Financial Consultants is a small yet dynamic firm led by the charismatic Daffy Duck. Specializing in investment strategies and wealth management, this company prides itself on thinking outside the box to deliver results for their clients. While their methods may sometimes raise eyebrows, their track record speaks for itself, with many clients swearing by Daffy Duck's financial prowess and knack for spotting lucrative opportunities.

The idea was to allow seekers to find companies and ask questions about them using RAG and GPT.

The RAG process flow

The prototype needed to encompass the minimum set of processes for a fully functional RAG implementation. Specifically, it needed to:

  1. Convert text documents into vector representations
  2. Add the vector representations to a vector store
  3. Prompt the seeker to enter a question
  4. Convert the seeker’s question into a vector representation
  5. Search the vector store for the best match to the question
  6. Create instructions (also known as a “prompt”) for ChatGPT to follow, including both the question asked and the document found in the vector store
  7. Send the prompt to ChatGPT and receive the answer
  8. Display the answer

1. Convert text documents into vector representations

While there are many libraries and services that can do this, I chose to use the OpenAI embedding service. It was easy to test using Postman, and once tested, Postman converted the request into C# code which provided a convenient starting point. Here's an example of an embedding request in Postman.

2. Add the vector representations to a vector store

Once a customer profile is turned into a vector, i.e. an array of numbers representing its meaning, it needs to be stored in a vector database that can later be searched. There are a large and growing number of vector databases out there. I took a superficial look at three: ChromaDb, DataStax, and MongoDb. I chose these three purely because they were referenced in some video tutorials, and my choice should not be construed as a recommendation, although all three of these seem like good choices for a variety of scenarios. ChromaDb is open source, very small, can be run from your desktop, but seemed more like a learning tool than something that might be used in a production environment. MongoDb is one of the heavyweights in the document (NoSql) database arena, and so would likely be a strong contender for an enterprise grade vector store. I settled on DataStax however, which provides a serverless implementation that is easy to configure, along with a robust API. After running my C# load routine my data looked like this in the DataStax console:

3. Prompt the seeker to enter a question

With my vector store loaded, the next step was to input a question to be answered, such as:

This was accomplished by writing a simple .NET Core console application.

4. Convert seeker questions into a vector representation

Arthur C. Clarke wrote: "Any sufficiently advanced technology is indistinguishable from magic". So, here's where I applied another bit of “magic” by converting the question into a vector, using the same OpenAI library used to vectorize the profile documents. Remember that vectors capture the (approximate) meaning of the text they are derived from. Once the question is in vector form, I queried the vector store for the document that most closely matched the question.

5. Search the vector store for the best match to the user's question

The prototype then calls the DataStax service to find the best match in the vector store. The snippet of C# that performs this call looks like this:

6. Format a prompt to send to ChatGPT

Now that the right source document had been found, I needed to format a prompt (i.e., instructions) to send to ChatGPT. This was quite simple; just tell ChatGPT to use the found text rather than its own knowledgebase. The prompt looked something like:

Note the wording. I instructed ChatGPT to answer the question using only the profile for the matching customer found in the vector store. This is no different than if I were to go to ChatGPT online and enter the prompt:

Answer the following question: "Which customer works in the fitness business?" using only the following data: "Popeye's Muscle Supplements | Founded by the spinach-fueled Popeye himself, Popeye's Muscle Supplements is a rising star in the health and wellness industry. Despite its modest size, this company has made waves with its range of all-natural supplements designed to boost strength and endurance. With endorsements from athletes and fitness enthusiasts alike, Popeye's Muscle Supplements is gaining traction as a trusted name in the competitive world of nutritional supplements."

7. Send the prompt to ChatGPT and receive the answer

Next, the prototype called the ChatGPT API programmatically with the above prompt to get its answer.

8. Display the answer

Et voila! We have a nicely worded answer to our question.

Now, you may have noticed that the word "fitness" appears in the customer profile for Popeye's Muscle Supplements, and might be thinking that it was a word match that resulted in this profile being selected. But what if I search using a word that has a similar meaning but does not appear in the profile? For example, the word "exercise" has a similar meaning to fitness, but is not found in the profile:

And what about a more general question, like what are all the services offered by Popeye's business?

This demonstrates the power of combining vector search with GPT. I don't need to program in stock answers, rather GPT handles producing a nicely worded answer to my question. Just to finish things off, here are a few more examples:

It's worth noting that one of the above questions was not answered at all, and another answer was not optimal, which indicates some more work is needed to fine tune the documents in the vector store. Remember that vector search works on probabilities and "nearness" between the question and the documents in the vector store. An important step that was not discussed here is the selection and preparation of documents for inclusion, which will have a significant impact on the quality of search results. Regardless, this simple prototype does fairly well, and demonstrates the major elements of a RAG solution.

Conclusion

This prototype is just a starting point, and I plan to extend it in a number of ways. For example, this version does not retain any context of earlier questions asked in a session, which would allow the seeker to refine their question without having to restate it each time. It also only includes one type of document - the customer profile, but a real-world RAG solution would likely include multiple types of documents such as customer, product, and sales rep profiles. Lastly, this example only includes unstructured data (documents), but many RAG scenarios combine both unstructured and structured data (such as that stored in a SQL database or in Excel).

While this prototype is rudimentary, I hope it has been instructive for both technical and non-technical readers. Software developers can see that while there are many new technologies here, none are more difficult to master than those that came before. Non-technical professionals can begin to imagine how scenario-specific RAG solutions might benefit their areas of expertise. And managers can envision an enterprise where employees have much easier ways to access the ever-growing body of information required to do their work.

要查看或添加评论,请登录

Mark Gerow的更多文章

  • Why enterprise search and AI are inextricably linked

    Why enterprise search and AI are inextricably linked

    Having developed enterprise AI applications for the past 8 years, first with Google’s DialogFlow NLP, and more recently…

    1 条评论
  • Artificial Intelligence - Who needs it?

    Artificial Intelligence - Who needs it?

    Reflections on AI from the trenches When I was just out of college and a cub developer in the IT department at Intel -…

    3 条评论
  • SQL Server can be your 1-stop datastore

    SQL Server can be your 1-stop datastore

    Old habits die hard, and as someone who's been using SQL Server since before Microsoft bought it from Sybase back in…

    1 条评论
  • Automated testing of AI applications

    Automated testing of AI applications

    Automated testing of application code is a mainstay of professional software development, whether for commercial…

    1 条评论
  • Thoughts on Enterprise Search (and AI)

    Thoughts on Enterprise Search (and AI)

    “Enterprise” search can mean many things, so I thought it would be useful to organize my thoughts on the subject…

  • Querying SQL Databases with AI

    Querying SQL Databases with AI

    One of the many intriguing uses for generative AI is to have it write SQL scripts for you. And lest you think this only…

  • How AI puts a key measure of software quality at risk

    How AI puts a key measure of software quality at risk

    It can be difficult to keep track of all the ways that AI is changing (upending!) software development. In this…

    2 条评论
  • Vector Databases - the hidden gem within the generative AI frenzy?

    Vector Databases - the hidden gem within the generative AI frenzy?

    I have recently begun to wonder if, as is so often the case with "revolutionary" new technologies, we haven't been so…

    2 条评论
  • AI Gets Real

    AI Gets Real

    Like many, I have a portion of my savings invested in the stock market, so I couldn't help but notice the significant…

    3 条评论
  • Summarizing Documents using AI

    Summarizing Documents using AI

    We are understandably fascinated by the ability of generative AI to seemingly converse with us. It appears so…

社区洞察

其他会员也浏览了