Stick a RAG in it…

Stick a RAG in it…

Have you ever looked at a tender or questions from a customer or client and thought – “These will be easy to answer! I know exactly where that information is..” – me neither, but with the current prominence of Generative AI, maybe that could be the answer. After a recent deluge of questions and some time on my hands, I decided maybe it deserved some investigation.

Is Generative AI the answer?

We all work in complex roles within data and information-dense organisations. Navigating internal and external data to find the correct answers for customers, partners, and colleagues is often challenging. I’m sure everyone reading this has at least tried out one of the current crop of excellent Generative AI platforms. One question keeps jumping into my mind: Can this generation of Large Language Models be used to make my work more efficient while ensuring good information security and compliance with business policies? At first glance, the answer appears to be no. Most organisations have placed heavy restrictions on (or completely banned) employees from using publicly hosted or operated Generative AI for reasons such as:

  • Data Security and Confidentiality: To prevent sensitive or proprietary information from being exposed or stored outside the company's secure environment, risking data breaches and leaks.
  • Compliance with Regulations: To ensure adherence to laws and regulations like GDPR or HIPAA, which dictate strict controls over how and where data is processed and stored.
  • Intellectual Property and Data Governance: To protect intellectual property rights and maintain strict data governance policies, ensuring that data is handled according to company guidelines.
  • Quality and Reliability of Outputs: To avoid reliance on Large Language Models (LLMs) that may not provide consistently accurate or reliable results and to mitigate the risk of LLMs "hallucinating" where LLMs generate incorrect or nonsensical information presented as fact.
  • Control and Monitoring of Use: To maintain control over company tools and data, ensuring audit trails are in place and that employee usage can be effectively monitored and managed.

Usually, all these points would drive people to cope with the lousy search tools and continue hunting down disparate information. However, I have a history with AI, being a recovered LISP programmer; I know that sometimes you need to accept you need 60 sets of nested brackets and continue looking deeper. So, I started my quest for answers – well, something that could generate answers…

The current generation of Large Language Models (LLMs) excels in communicating with humans in natural language. These models are typically trained on vast public datasets, making them good at providing general information and generating responses even when they lack specific knowledge. However, they do not handle specialised, private, or domain-specific information. To enable Generative AI to access non-public information or to deliver answers based on a curated set of specific data, it's necessary to supplement the LLM with this extra information

.The current common approaches to extending LLM’s are finetuning and Retrieval Augmented Generation. Let's have a look at how these approaches might fit.

Fine Tuning

In finetuning, we effectively send the LLM back to school, teaching it new subjects and directly embedding further information into the Large Language Model. This approach allows the models to be adapted to specific knowledge domains or information sets. This process is one way (like school) you can’t easily remove information finetuned into the model, and it will need to be constantly tuned if the domain or information set changes.

Figure 1 - Grossly oversimplified Finetuning approach

Advantages of Fine-tuning

  • Improved Performance on Specific Tasks: Fine-tuning tailors the model to specific domains or tasks, enhancing its accuracy and relevance in those areas.
  • Customisation for Unique Requirements: It allows customisation for specific industries or user needs, which the base model might need to address adequately.
  • Efficiency in Data Handling: Fine-tuned models can more efficiently process and understand domain-specific data.
  • Reduced Generalization Errors: By training on a specific dataset, fine-tuning reduces errors due to the base model's generalist nature.
  • Language and Cultural Sensitivity: It can improve the model's understanding of regional languages, dialects, and cultural nuances.
  • Better User Experience: A fine-tuned model can provide more relevant and accurate responses for applications like chatbots, enhancing user satisfaction.

Disadvantages

  • Resource Intensive: Fine-tuning requires computational resources and expertise, which can be expensive and time-consuming.
  • Data Privacy and Security: The process requires access to potentially sensitive data, which raises privacy and security concerns.
  • Maintenance and Updating: A fine-tuned model may need regular updates to maintain its performance, especially in rapidly changing fields.
  • Limited Generalisability (Overfit): A model fine-tuned for one task or domain might not suit others, limiting its broader applicability.
  • Bias Amplification: If the training data has biases, fine-tuning can amplify these, leading to ethical concerns.

RAG – Retrieval Augmented Generation

If finetuning is like sending the LLM back to school. In contrast, Retrieval Augmented Generation (RAG) is like a skilled debater or orator who references a near-instantaneous, well-stocked library to answer questions.

In this analogy, the debater (the language model) is already knowledgeable and articulate, but they turn to a library (the external database) for specific facts, quotes, or references. This enhances their arguments with accurate, detailed, and relevant information beyond their internal knowledge, just as RAG uses external data to improve the language model's responses.

This approach gives Generative AI access to well-curated domain knowledge without the need to directly embed the domain information into the model.

Figure 2- Vastly oversimplified RAG model

Eating data chunks

This dramatically simplifies the process of curating, updating and removing information, providing an agile and adaptable approach. This model allows us to integrate information sources, whether flat files, documents, existing information stores or data-driven systems. However, for the best experience, the information for the knowledge database needs to be prepared and processed in a specific way to get the best results out of Retrieval Augmented Generative systems. We must create a “semantic embeddings” store from our information set to achieve this.

Large Language Model (LLM) semantic embeddings are a method to convert words or text into numbers; these numbers allow computers to understand and process language. Imagine each piece of text as a point in a massive multi-dimensional space; texts with similar meanings are close together, while different meanings are farther apart. These embeddings are created by a Large Language Model (LLM) trained explicitly on large text datasets, enabling it to recognise patterns and context in language. Essentially, these embeddings are like a detailed map of language, with every word or phrase having unique coordinates that represent its meaning, helping computers "understand" language.

Figure 3- RAG Ingestion to private knowledge store

Advantages of RAG with LLMs

  • Enhanced Accuracy and Relevance: RAG models can pull up-to-date information from external sources, leading to more accurate and relevant responses, especially for fact-based queries.
  • Improved Handling of Specific Queries: RAG can provide more detailed and specific answers than a standalone LLM by leveraging external databases or documents.
  • Dynamic Content Integration: RAG models can integrate dynamic content from the web or specific databases, keeping the responses current.
  • Reduction in Training Data Requirements: Since RAG models can retrieve information, they might require less training data for specific knowledge domains.
  • Flexibility and Scalability: RAG allows for the easy integration of various data sources, making the model flexible and scalable for different applications.
  • Customisable Knowledge Sources: The ability to choose and change external sources for retrieval allows for customisation based on the task or domain.

Disadvantages of RAG with LLMs

  • Dependency on External Sources: The quality of responses heavily depends on the reliability and accuracy of the external sources used for retrieval.
  • Complexity in Integration: Integrating and maintaining the retrieval component alongside the LLM can be technically complex.
  • Latency Issues: Retrieving information in real time can introduce latency, impacting the response time of the model.
  • Data Privacy Concerns: Data privacy and security challenges could exist if the retrieval sources contain sensitive information.
  • Risk of Information Overload: The model might retrieve overwhelming information, making it challenging to generate concise and coherent responses.
  • Potential for Propagating Misinformation: The model might inadvertently propagate misinformation if the external sources contain incorrect or misleading information.
  • Maintenance and Update Requirements: Keeping the retrieval sources updated and relevant requires continuous effort and resources.

There are other approaches to enabling LLMs to access additional domain-specific information, but for this application Retrieval Augmented Generation meets all my requirements. Having figured out a method, the next step was to figure out how to build a RAG system of my own. So began project HAILE (Hallucinating Artificial Intelligent Language Elucidator) …

In the next post, we'll look at building an RAG system from Opensource libraries and tools that can take your data and produce valuable answers…


要查看或添加评论,请登录

Matthew Farmer的更多文章

社区洞察

其他会员也浏览了