登录查看更多内容

Issue 23: Optimizing LLMs - Prompt engineering, Fine tuning, RAG and more

Sharmilli Ghosh

Product Management | GTM | ISV & SI Partnerships | Startup Founder | Board Member | Investor |

发布日期: 2024年3月12日

I decided to transition from exploring Industry use cases to a slightly technical topic today. What are the techniques for optimizing Large Language Models.

Introduction

The market for LLMs is expanding rapidly, with a projection that by 2025, there will be 750 million apps using LLMs, and 50% of digital work is estimated to be automated through apps using these language models.

We are already seeing a large number of high performing LLMs in market today, competing on various benchmarks. I have an article on benchmarking here. ?

It is impossible to get an exact number of how many LLMs are available in market globally, but a good place to look is the Hugging Face marketplace. It shows over a million models, datasets, and apps across multiple categories, with a significant daily download rate indicating a broad and active usage across the machine learning and AI community.

The section for Models categorizes models into various functionalities like Text Generation, Text-to-Image, Image-to-Text, and more, with specific counts for each category listed. For example, the Text Generation category alone has several sub-listings with thousands to over a million updates, reflecting the dynamic and growing nature of the repository.

However, we are also becoming increasingly aware that the same LLMs perform poorly different scenarios - the benchmarking tables show the quantitative variation quite well. When applied to specific enterprise scenarios, they vary even more, depending on the use cases, and whether they require localized enterprise or domain specific information.

LLMs also have many other limitations. Let me try to summarize briefly, before discussing the mitigation strategies.

Limitations of LLMs:

Understanding Context: While large language models can generate coherent and contextually relevant responses, their understanding of context is limited by their training data and algorithms. This is where the enterprise, use case or domain context is important.
Generalization vs. Specialization: While off the shelf models are very good at generating general responses across a wide range of topics, they may lack the depth and accuracy of a specialist in a specific field. This can be a limitation for highly technical or niche subjects where expert knowledge is crucial.
Lack of Real-Time Knowledge: LLMs are mostly trained on the internet data, and data available in public domain… these datasets that are not updated often and are certainly not updated in real-time. As a result, they lack knowledge of events or developments that occurred after their last training update. You will often see a response saying ‘as of Jan 2024…’ which indicated that last date of the training data.
Bias in Training Data: The bias or inaccuracies in the training data will be retained in the models, therefore the output will contain the same bias, inappropriate or inaccurate content. There is awareness and effort to mitigate these issues but it is far from addressed.
Limited Understanding of User Intent: While models can generate responses based on the text input they receive, they might not always accurately grasp the user's intent, especially if the query is ambiguous or complex. This can lead to irrelevant or off-target responses.
Limited EQ: Clearly models do not have emotions, consciousness, or subjective experiences. Any expression of emotion or empathy is simulated based on patterns in the training data and not rooted in actual feelings or experiences.
Garbage In Garbage Out still holds: Like in all traditional AI, the quality of the data impacts the quality of the prediction, in Generative AI the quality and clarity of the input text significantly impacts the quality of the model's responses. Vague or poorly structured questions lead to unclear, inaccurate, or irrelevant answers.
The?issue of ‘Synthetic’ Creativity: While large language models can generate content that appears creative or novel, they are essentially recombining and manipulating patterns from their training data. Their "creativity" is bound by the scope of what they've been trained on and not their ability to conceive original ideas.
Ethical and Privacy Concerns: The use of large language models raises ethical questions around privacy (especially when processing personal or sensitive information), the potential for misuse (e.g., generating disinformation), and the impact on employment in certain sectors.
Resource Intensity: Training and running large language models require significant compute power, contributing to high energy consumption and pretty significant environmental impact too.

Technical terms for limitations

Hallucination - This is a common issue faced by LLMs where the model generates factually incorrect data for the given context. This is usually due to the fact that the model is generating some text based on its generic training, and is not familiar with the business vocabulary, unfamiliar question patterns, outdated training knowledge, lack of local knowledge private to the organization etc.
Knowledge Cut-Off: The model’s knowledge remains static and frozen to the time and scope it has been trained to.
Black box in the context of LLMs refers to the opaque inner workings of these models, making it challenging to understand how they reach conclusions.?
Explainability: Related to the above, as the term suggests, explainability of a model determines how best it can explain how, why and from where a particular output was generated. This is a very complex issue in Gen AI, and is the foundation for trustability and adoption of models. Techniques like counterfactual approximation methods and intervention-based methods are also being explored to enhance interpretability and trust in LLMs
Token limits: A token is a unit of text that is used to represent a word or phrase. Token limits are relevant because they can affect the performance of LLMs. If the token limit is too low, the LLM may not be able to generate the desired output. For example, if you are trying to generate a 1000-word document but the token limit is 1000, the LLM will only be able to generate the first 1000 tokens. Whereas if it’s too high, the LLM is going to be very slow and require very high computational power.Here is a table of some LLMs and their token limits and weaknesses.

The Solution

Gen AI applications leveraging LLMs are directly impacted in terms of accuracy of response, cost of implementation, and trustability. There are three high level prominent techniques to address these limitations, and then there are variations of these techniques.

The prominent techniques are:

Finetuning: Finetuning is the process of adding domain specific data and training to a pre-trained model to achieve specific tasks, to improve accuracy. However adding more data and training is naturally more expensive. Therefore there are other variations. However, it still suffers from knowledge cutoff and even hallucination to some extent. That’s where Prompt engineering comes in.

Prompt Engineering: Prompting is a technique of guiding a a response from an LLM by refining the inputs supplied. Prompting methods can vary from simple phrases to detailed instructions based on task requirements and model capability. The approach of designing and optimizing prompts is called prompt engineering. But prompt engineering alone cannot address the core issue of not having the required business or local enterprise knowledge. That’s where the next technique comes in.

Retrieval Augmented Generation (RAG): RAG brings in the power of retrieving context from relevant data sources and combines with overall prompting strategy to generate contextually accurate responses grounded on facts. Essentially RAG technique enables the model to lookup external information to improve the response generation. This is an extremely powerful capability where the model can actively refer to proprietary or internal knowledge bases. RAG techniques minimize hallucinations, is time relevant, transparent in terms of sourcing of information, and relatively cost effective.

Choosing the technique right for you

In summary, the use case/s will determine which technique is appropriate, and the importance of factors such as:

·?????? Importance of date / time sensitivity

HackerRank 5 个月前

How Generative AI Fuels the No-Code Development…

Data Science Dojo 6 个月前

The Future of Machine Learning - Seamless Integration,…

A3Logics 11 个月前

·?????? Importance of domain specific terminology, vocabulary

·?????? Importance of traceability, transparency, reliability

·?????? Importance of reducing Hallucinations or inaccurate answers

Implementation considerations

In addition to these, there are considerations such as:

Cost of training, fine tuning, testing
Complexity of implementation
Penalty for errors
Environmental implications
Ethical considerations

The cost and complexity is a material difference:

In many cases, it will be a combination of three techniques:

·??????? Finetuning: Bringing in the understanding of the business and local contextual intelligence.

·??????? Prompt Engineering: Ensuring prompts are optimized to help the model understand the intent and context of the request.

·??????? RAG: Enhancing with factual, time relevant and traceable information.

Here is a visual on the comparable techniques to help.

References

Prompt Engineering for Generative AI (https://developers.google.com/machine-learning/resources/prompt-eng)
Full Fine-Tuning, PEFT, Prompt Engineering, and RAG: Which One Is Right for You? (https://deci.ai/blog/fine-tuning-peft-prompt-engineering-and-rag-which-one-is-right-for-you)
Fine-tune a pretrained model (https://huggingface.co/docs/transformers/training)
LLM Fine Tuning Guide for Enterprises in 2023 (https://research.aimultiple.com/llm-fine-tuning/)
Fine-tuning Large Enterprise Language Models via Ontological Reasoning (https://arxiv.org/abs/2306.10723)
Retrieval Augmented Generation (RAG): Reducing Hallucinations in GenAI Applications (https://www.pinecone.io/learn/retrieval-augmented-generation/)

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (https://arxiv.org/abs/2005.11401

Coming up Next

AI-griculture - the role of AI in Agriculture
AI in Cybersecurity | Auth0/Okta
The rise of Small Language Models
Where is Apple? Slow and steady wins the race?
The environmental cost of AI
Decoding ML, DL, LLM, AGI, and the world of AI
Rhythms | A new AI powered operating systems to transform the future of work?
The Chip War

Art of the Possible, with AI

629 位关注者

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

8 个月

It's crucial to delve into the nuances of LLMs, understanding their performance variations, limitations, and the necessity for optimizations. Considering the complexity of language models and their diverse applications, what specific challenges do you think arise in fine-tuning and implementing optimization techniques like prompt engineering and retrieval-augmented generation?

要查看或添加评论，请登录

查看全部

Issue 23: Optimizing LLMs - Prompt engineering, Fine tuning, RAG and more

Sharmilli Ghosh

Product Management | GTM | ISV & SI Partnerships | Startup Founder | Board Member | Investor |

Introduction

Limitations of LLMs:

Technical terms for limitations

The Solution

Choosing the technique right for you

领英推荐

Implementation considerations

References

Coming up Next

Art of the Possible, with AI

629 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Should you use Retrieval-Augmented Generation (RAG) or Train the Model?

Demystifying the Machine Learning Engineering Pipeline

Role, Context, and Action Awareness: The Simplest Yet Effective Prompt Engineering Tactic

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Knowledge Graphs: Today's triples just ain't enough

MLops & AI Models with real-world applications.

Software Engineering for AI: The New Interdisciplinary Paradigm for Innovation

The Art of Prompt Engineering: Crafting Effective AI Queries

Data Phoenix Digest - ISSUE 2.2023

Introduction

Limitations of LLMs:

Technical terms for limitations

The Solution

Choosing the technique right for you

领英推荐

Implementation considerations

References

Coming up Next

Art of the Possible, with AI

629 位关注者

Issue 36: Anthropic, the rising star

2024年11月13日

NVIDIA: In the eye of the storm

2024年10月30日

Issue 34: The importance of ESG and AI

2024年7月2日

Issue 33: Carbon Footprint of AI

2024年6月27日

Issue 32: Anthropic and enterprise AI

2024年6月25日

Issue 31: IOT, AI and Greentech

2024年6月9日

Issue 30: Transforming Creative Industries

2024年6月6日

Issue 29: The intelligence behind your food delivery

2024年5月31日

Issue 28: Autonomous AI Agents

2024年5月29日

Issue 27: Understanding Prompt Engineering

2024年5月26日

社区洞察

其他会员也浏览了

Should you use Retrieval-Augmented Generation (RAG) or Train the Model?

Demystifying the Machine Learning Engineering Pipeline

Role, Context, and Action Awareness: The Simplest Yet Effective Prompt Engineering Tactic

OpenAI Introduces Structured Outputs - A Breakthrough for Developers

Knowledge Graphs: Today's triples just ain't enough

MLops & AI Models with real-world applications.

Software Engineering for AI: The New Interdisciplinary Paradigm for Innovation

The Art of Prompt Engineering: Crafting Effective AI Queries

Data Phoenix Digest - ISSUE 2.2023