A Quick Overview on Use Case-Specific Tailoring of LLMs
Large Language Models (LLMs) enable organizations to revolutionize business operations by optimizing processes and workflows – if they identify industry-specific use cases and succeed in customizing LLMs accordingly. While LLMs are indeed large neural networks (NNs), their deployment and usage require new approaches. In contrast to traditional NNs, LLMs have a much higher number of parameters, requiring massive investments for training them.
When I look on my mobile, I have an app named ?Naturblick? (?nature view?). It helps me understand whether plants growing on my balcony will become colorful flowers or annoying weeds. The app probably relies on a neural network. Even smaller organizations can afford to engineer a neural network to identify plants. They choose a neural network topology (e.g., number of layers, connectivity, etc.) and train the neural network with their specific training data (Figure 1). It is a realistic undertaking. However, building large LLMs from scratch is resource-wise impossible for most organizations. Luckily, there is a strategy for building on pre-trained models and tailoring them that the AI community developed already before the LLM era: transfer learning.
Transfer learning, or fine-tuning, starts with an existing neural network model trained on a large data set. Then, engineers perform additional training with a smaller, specific dataset to adapt the model for a particular use case—like identifying flawed wheels on an assembly belt (Figure 2). It might imply having a new last classification layer (e.g., a wheel with damaged spokes, defective tire) and additional intermediate layers (green), and it might also involve adapting existing parameter values in earlier layers (yellow).
?Fine-tuning for LLMs is relatively new due to cost and complexities. OpenAI started offering a fine-tuning capability for some of their models. Currently, however, two different approaches dominate the current enterprise LLM reality: prompt engineering and Retrieval-Augmented Generation (RAG). Both rely on a fully trained LLM that remains untouched.
Prompt engineering is well-known among end users who interact directly with an LLM. It means embedding the actual LLM query in a broader context to help the LLM provide a more relevant response. Assuming an LLM powers a customer service chatbot. Then, prompt engineering can define the LLM’s persona as a friendly customer service representative. To achieve that, the chatbot does not simply forward customer requests to an LLM. Instead, it adds the context ?friendly customer service representative? to each request sent to the LLM API.
领英推荐
Prompt engineering has convincing merits but cannot address all challenges of LLMs in a real-world corporate context:
Given these limitations, organizations often turn to Retrieval-Augmented Generation (RAG), an architecture that combines precise facts from a database with the natural language skills of an LLM (Figure 4). The idea is first to submit the query against a vector respectively document database to identify and retrieve similar documents. So, if the query is about the weather tomorrow in Zürich, documents with the words ?weather,? ?tomorrow,? and ?zürich? would undoubtedly be highly ranked. The documents from the vector database that match the user query best are added as context to the query, which then goes to the LLM, which formulates the final answer. In this architecture, the LLM contributes its natural language processing and reasoning capabilities to generate a response for the user. Hard facts (for specific queries) come from the vector database. Thus, RAG is cost-effective since it doesn’t require retraining the LLM.
To conclude: While LLMs are neuronal networks, (nearly) all organizations build their solutions on top of out-of-the-box LLMs. This preference reflects the considerable size of LLMs and high training efforts. Thus, to enhance and adapt the capabilities of LLMs, companies employ one or more of the following three techniques on top of existing LLMs, as Figure 5 visualizes: prompt engineering, retrieval-augmented generation, and/or fine-tuning.