RAG #5 - A recap and progress check-in

RAG #5 - A recap and progress check-in

In this post I step back to review the prototype now that both unstructured and structured data is incorporated. Included is a short video demonstrating both successful and problematic responses and discusses some ways in which question phrasing can be used to address failures.

Why create a prototype?

It's always important to keep in mind the "why" behind any project. In this case my goal is to strip away the hype and noise around RAG and generative AI and expose both its potential and shortcomings. While this is a personal journey of discovery, I hope that others find it helpful too. The key question is what business problems lend themselves to RAG and how to make it work. The #1 use I see for RAG is to create powerful question/answer systems that seekers can interact with using natural language. Such systems will hide the complexity of the countless sources of data and documents that a typical knowledge worker must sift through to do their work. The promise of RAG is a significant increase in productivity and the quality of decision making.

Where are we?

The RAG prototype has evolved through five iterations so far, each providing successively more functionality, specifically:

  1. The ability to ask questions about customers
  2. The ability to query unstructured data from multiple sources: customers, products, and sales reps
  3. The ability to query structured data from a SQL database
  4. The ability to detect whether a question should be answered from a vector (unstructured) or a SQL database (structured)
  5. A refinement to help bridge the difference between how search operates in vector and SQL databases

I chose to use .NET, C#, OpenAI (embedding and GPT3.5), Azure SQL (for structured data), and DataStax (for unstructured data). Other developers might choose other tools. The bottom line is that you will need:

  1. A programming language or comparable tool
  2. A way to turn text into embeddings
  3. A large language model such as GPT3.5, GPT4, Anthropic, etc.
  4. Source data for RAG - which might be in operational or decision support SQL databases, or in documents that need to be embedded and stored in a vector database

Where to go next?

The truth is that with generative AI there is likely no clear end point. At least two factors contribute to my assertion: 1) generative AI is probabilistic, and so there is always room to refine any application to improve the quality of the responses and reduce the chance of hallucinations, and 2) the field is evolving so quickly that any solution relying on generative AI will also need to evolve at a rapid pace. Having said this, there are two refinements to the current prototype that I'd like to explore next:

  1. Expand the types and number of data sources - in an enterprise there will likely be many data sources that need to be incorporated in a generative AI based solution. They might include SQL database and documents, but also in-house and 3rd-party web services, search engines, and more. Expanding the prototype to include these and other data sources will require reliably detecting and selecting among a much larger set of alternatives.
  2. Retain the question content so that subsequent questions may utilize information from earlier questions and answers. This should be straightforward, with the caveat that each GPT model has a maximum amount of "history" it can retain in memory, and it will be interesting to see what tuning may be required to maximize the amount of question/answer history that can be retained.

I would love to hear if you find this series helpful, and if so, which aspects. I am also open to suggestions for which areas of RAG this series should explore.

Onward!


要查看或添加评论,请登录

社区洞察

其他会员也浏览了