登录查看更多内容

A Quick Overview on Use Case-Specific Tailoring of LLMs

Klaus Haller

发布日期: 2024年8月12日

Large Language Models (LLMs) enable organizations to revolutionize business operations by optimizing processes and workflows – if they identify industry-specific use cases and succeed in customizing LLMs accordingly. While LLMs are indeed large neural networks (NNs), their deployment and usage require new approaches. In contrast to traditional NNs, LLMs have a much higher number of parameters, requiring massive investments for training them.

When I look on my mobile, I have an app named ?Naturblick? (?nature view?). It helps me understand whether plants growing on my balcony will become colorful flowers or annoying weeds. The app probably relies on a neural network. Even smaller organizations can afford to engineer a neural network to identify plants. They choose a neural network topology (e.g., number of layers, connectivity, etc.) and train the neural network with their specific training data (Figure 1). It is a realistic undertaking. However, building large LLMs from scratch is resource-wise impossible for most organizations. Luckily, there is a strategy for building on pre-trained models and tailoring them that the AI community developed already before the LLM era: transfer learning.

Figure 1: Customer-specific neural network

Transfer learning, or fine-tuning, starts with an existing neural network model trained on a large data set. Then, engineers perform additional training with a smaller, specific dataset to adapt the model for a particular use case—like identifying flawed wheels on an assembly belt (Figure 2). It might imply having a new last classification layer (e.g., a wheel with damaged spokes, defective tire) and additional intermediate layers (green), and it might also involve adapting existing parameter values in earlier layers (yellow).

Figure 2: Fine-tuning – taking a fully trained model (left) and adapting it for a specific use case or domain (right)

?Fine-tuning for LLMs is relatively new due to cost and complexities. OpenAI started offering a fine-tuning capability for some of their models. Currently, however, two different approaches dominate the current enterprise LLM reality: prompt engineering and Retrieval-Augmented Generation (RAG). Both rely on a fully trained LLM that remains untouched.

Prompt engineering is well-known among end users who interact directly with an LLM. It means embedding the actual LLM query in a broader context to help the LLM provide a more relevant response. Assuming an LLM powers a customer service chatbot. Then, prompt engineering can define the LLM’s persona as a friendly customer service representative. To achieve that, the chatbot does not simply forward customer requests to an LLM. Instead, it adds the context ?friendly customer service representative? to each request sent to the LLM API.

Michael Spencer 2 年前

Outperforming LLMs with Fewer Data and Smaller Model…

Danny Butvinik 1 年前

Beyond Language Models: The Quest for True Artificial…

Ramón G. 1 个月前

Figure 3: Prompt Engineering—Using a Fully Trained LLM while adapting user requests with prompts

Prompt engineering has convincing merits but cannot address all challenges of LLMs in a real-world corporate context:

Limited up-to-date knowledge: LLMs are trained on datasets containing facts and data until a specific date. An LLM has no glue about yesterday’s, today’s, or tomorrow’s weather.
Missing organization-specific knowledge: The training data primarily consists of public information from the Internet. Proprietary documents like unfiled patent applications, internal network architecture designs, or a history of customer requests are not publicly available and, therefore, not part of any downloadable LLM (obviously, this should not change).
Hallucinations: LLMs can generate plausible-sounding but inaccurate or completely made-up information, especially in areas in which they have limited knowledge.

Given these limitations, organizations often turn to Retrieval-Augmented Generation (RAG), an architecture that combines precise facts from a database with the natural language skills of an LLM (Figure 4). The idea is first to submit the query against a vector respectively document database to identify and retrieve similar documents. So, if the query is about the weather tomorrow in Zürich, documents with the words ?weather,? ?tomorrow,? and ?zürich? would undoubtedly be highly ranked. The documents from the vector database that match the user query best are added as context to the query, which then goes to the LLM, which formulates the final answer. In this architecture, the LLM contributes its natural language processing and reasoning capabilities to generate a response for the user. Hard facts (for specific queries) come from the vector database. Thus, RAG is cost-effective since it doesn’t require retraining the LLM.

To conclude: While LLMs are neuronal networks, (nearly) all organizations build their solutions on top of out-of-the-box LLMs. This preference reflects the considerable size of LLMs and high training efforts. Thus, to enhance and adapt the capabilities of LLMs, companies employ one or more of the following three techniques on top of existing LLMs, as Figure 5 visualizes: prompt engineering, retrieval-augmented generation, and/or fine-tuning.

Figure 5: Overview of LLM Usage / Optimization Techniques

A Quick Overview on Use Case-Specific Tailoring of LLMs

Klaus Haller

领英推荐

Klaus on AI

434 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

How to optimize an AI algorithm

Large Language Models - Part 3

Meet Vectara: powerful, free neural search

What's the most important technology of today?

Encoder decoder to Transfer learning: An analysis of all research papers contributed towards journey of Transformers Architecture (LLM's)

AI is Dead! Long Live AI

Hallucinations in LLMs: bug or feature?

Future of Artificial Intelligence

Where Semantics and Machine Learning Converge

Anatomy of the Beast with many heads! [with code]

领英推荐

Klaus on AI

434 位关注者

GCP Security Center & PaaS

2024年11月18日

Offline Capabilities for Cloud Applications

2024年11月17日

Post Quantum Cryptography: A short update.

2024年11月14日

Protecting IaaS Workloads - A quick look at GCP’s Security Command Center

2024年11月6日

Understanding Intrusion Detection and Prevention Systems (IDS/IPS)

2024年10月26日

Understanding Multi-Tenant and Cross-Tenant Threats

2024年10月5日

Insights from the Symantec User Group Switzerland 2024 Event

2024年9月12日

My impressions from the 2024 AWS Summit in Zurich

2024年9月5日

A quick look at ?Ad-Hoc Data Analytics to DataOps?

2024年8月24日

NIST Update on Post-Quantum Algorithms

2024年8月18日