Retrieval Augumented Generation
Anyone within the industry who has utilized ChatGPT for business purposes would likely have had the thought, "This is truly impressive! I appreciate how GPT can effectively address my inquiries. Now, the question is, how can I implement this for my own use? Can I train it using my specific data?"
Upon delving into this, one begins to explore the costs and complexities associated with training. This raises the question of whether such an endeavor is feasible or advisable. It seems unlikely that we are prepared to become direct competitors with OpenAI at this time.
A group of Meta AI researchers introduced a methodology known as Retrieval Augmented Generation (RAG) to tackle tasks that require substantial knowledge. RAG merges an information retrieval component with a text generation model. This allows RAG to be fine-tuned and its internal knowledge to be adjusted efficiently without requiring a complete retraining of the entire model.
RAG operates by taking an input and retrieving a collection of pertinent supporting documents based on a given source. These documents are then concatenated as context with the original input, which is subsequently fed into the text generation component to produce the final output. This adaptability of RAG proves valuable for scenarios in which factual information may evolve over time, addressing a limitation of Language Model's static knowledge. RAG's approach permits language models to bypass the need for complete retraining, enabling them to access the most up-to-date information for generating accurate outputs via retrieval-based generation.
The process of implementing RAG involves several steps:
Candidate Selection: The retrieval system identifies a set of text snippets that are potential candidates due to their relevance to the input context or query.
Scoring and Ranking: Each candidate snippet is assigned a score based on factors such as relevance and accuracy. The retrieval system arranges the candidate snippets in order of their scores.
Input Combination: The top-rated candidate snippets are combined with the original input context or query, creating an extended input that encompasses both retrieved text and the original input.
Generation Process: The extended input is fed into the generative model, which utilizes both the retrieved text snippets and the original input to generate the final text output.
Is it possible to construct such a system?
领英推荐
Leading cloud service providers like Microsoft and Amazon offer RAG solutions.
RAG with Azure Machine Learning:
In Azure Machine Learning, RAG is facilitated through integration with Azure OpenAI Service, making use of large language models and vectorization. This integration supports tools like Faiss and Azure Cognitive Search as vector stores, along with open-source offerings like LangChain for data chunking. Implementing RAG involves formatting data to enable efficient searchability before sending it to the Language Model, ultimately optimizing token consumption. Regularly updating the data is also crucial for maintaining RAG's effectiveness.
RAG with Amazon SageMaker:
External data that enhances prompts can come from various sources like document repositories, databases, or APIs. The process involves converting documents and user queries into a compatible format for relevance searches. Embedding language models are used to transform the data into numerical representations, allowing comparisons. RAG models leverage these embeddings to combine user queries and relevant context, which is then fed to the foundation model. Knowledge libraries and their embeddings can be updated asynchronously.
The process is similar across platforms like AWS, Azure, and IBM, and open-source tools like Haystack can also achieve similar results.
The era of generative AI has unlocked numerous capabilities for existing systems. One notable advancement is Vector databases and retrieval augmented generation. This overview only scratches the surface of the potential, such as building AI agents capable of processing various data types like text, images, videos, or audio. RAG and vector databases tackle the challenges of extended context windows in Language Models, bringing historical knowledge-based reasoning to the forefront.