Unleashing the Power of LLMs
Marcos Esteban Soto
MSc Big Data & BI | Sr ML Engineer | Data Scientist | WT & Flow Measurements
Enables more factual consistency, improves the reliability of the generated responses
In this growing wave of need for applications based on LLMs for multiple uses in which updated knowledge is generally required, or answers based on context, how can we decide what approach we need to use? Well, this document intends to serve as a summary of the different approaches that we can use depending on our case.
“The problem of hallucination”
A large language model is a trained machine-learning model that generates text based on the prompt you provided. The model’s training equipped it with some knowledge derived from the training data we provided. It is difficult to tell what knowledge a model remembers or what it does not. In fact, when a model generates text, it can’t tell if the generation is accurate.
In the context of LLMs, “hallucination” refers to a phenomenon where the model generates text that is incorrect, nonsensical, or not real. Since LLMs are not databases or search engines, they would not cite where their response is based. These models generate text as an extrapolation from the prompt you provided. There are some methods to mitigate the hallucination of an LLM like we'll see next.
Prompt Engineering
Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LLMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs). [1]
Prompt techniques to get the desired output from the language model: [2]
Fine-tuning
Fine-tuning is taking a pre-trained model and training at least one internal model parameter (i.e. weights). In the context of LLMs, what this typically accomplishes is transforming a general-purpose base model (e.g. GPT-3) into a specialized model for a particular use case (e.g. ChatGPT). [3]
The key upside of this approach is that models can achieve better performance while requiring (far) fewer manually labeled examples compared to models that solely rely on supervised training.
Fine-tuning improves on few-shot learning by training on many more examples than can fit in the prompt, letting you achieve better results on a wide number of tasks. Once a model has been fine-tuned, you won't need to provide examples in the prompt anymore. This saves costs and enables lower-latency requests.
Some common use cases where fine-tuning can improve results:
领英推荐
Prompt Tuning
Fine-tuning was the best way to redeploy one of these pre-trained models for specialized tasks. You gathered and labeled examples of the target task and fine-tuned your model rather than train an entirely new one from scratch. But a simpler, more energy-efficient technique has emerged as foundation models grow relentlessly larger: prompt-tuning.
In prompt-tuning, the best cues, or front-end prompts, are fed to your AI model to give it task-specific context. The prompts can be extra words introduced by a human, or AI-generated numbers introduced into the model's embedding layer. Like crossword puzzle clues, both prompt types guide the model toward a desired decision or prediction. Prompt-tuning allows a company with limited data to tailor a massive model to a narrow task. It also eliminates the need to update the model’s billions (or trillions) of weights, or parameters. [4]
Retrieval Augmented Generation (RAG)
For complex and knowledge-intensive tasks, it's possible to build a language model-based system that accesses external knowledge sources to complete tasks. This enables more factual consistency, improves the reliability of the generated responses, and helps to mitigate the problem of "hallucination".
Meta AI researchers introduced a method called Retrieval Augmented Generation (RAG) to address such knowledge-intensive tasks. RAG combines an information retrieval component with a text generator model. RAG can be fine-tuned and its internal knowledge can be modified in an efficient manner and without needing retraining of the entire model.
RAG takes an input and retrieves a set of relevant/supporting documents given a source (e.g., Wikipedia). The documents are concatenated as context with the original input prompt and fed to the text generator which produces the final output. This makes RAG adaptive for situations where facts could evolve over time. This is very useful as LLMs's parametric knowledge is static. RAG allows language models to bypass retraining, enabling access to the latest information for generating reliable outputs via retrieval-based generation.
Lewis et al., (2021) proposed a general-purpose fine-tuning recipe for RAG. A pre-trained seq2seq model is used as the parametric memory and a dense vector index of Wikipedia is used as non-parametric memory (accessed using a neural pre-trained retriever). Below is an overview of how the approach works:
RAG performs strongly on several benchmarks such as Natural Questions, WebQuestions, and CuratedTrec. RAG generates responses that are more factual, specific, and diverse when tested on MS-MARCO and Jeopardy questions. RAG also improves results on FEVER fact verification.
This shows the potential of RAG as a viable option for enhancing the outputs of language models in knowledge-intensive tasks.
More recently, these retriever-based approaches have become more popular and are combined with popular LLMs like ChatGPT to improve capabilities and factual consistency.
Conclusion:
The choice between prompt engineering, fine-tuning, or RAG depends on the specific requirements and constraints of the task at hand. Each approach offers unique advantages, and understanding their nuances is essential for effectively harnessing the potential of LLMs in diverse applications.
References: