Issue 23: Optimizing LLMs - Prompt engineering, Fine tuning, RAG and more
Created y DALL-E

Issue 23: Optimizing LLMs - Prompt engineering, Fine tuning, RAG and more

I decided to transition from exploring Industry use cases to a slightly technical topic today. What are the techniques for optimizing Large Language Models.

Introduction

The market for LLMs is expanding rapidly, with a projection that by 2025, there will be 750 million apps using LLMs, and 50% of digital work is estimated to be automated through apps using these language models.

We are already seeing a large number of high performing LLMs in market today, competing on various benchmarks. I have an article on benchmarking here. ?

It is impossible to get an exact number of how many LLMs are available in market globally, but a good place to look is the Hugging Face marketplace. It shows over a million models, datasets, and apps across multiple categories, with a significant daily download rate indicating a broad and active usage across the machine learning and AI community.

The section for Models categorizes models into various functionalities like Text Generation, Text-to-Image, Image-to-Text, and more, with specific counts for each category listed. For example, the Text Generation category alone has several sub-listings with thousands to over a million updates, reflecting the dynamic and growing nature of the repository.

However, we are also becoming increasingly aware that the same LLMs perform poorly different scenarios - the benchmarking tables show the quantitative variation quite well. When applied to specific enterprise scenarios, they vary even more, depending on the use cases, and whether they require localized enterprise or domain specific information.

LLMs also have many other limitations. Let me try to summarize briefly, before discussing the mitigation strategies.

Limitations of LLMs:

  • Understanding Context: While large language models can generate coherent and contextually relevant responses, their understanding of context is limited by their training data and algorithms. This is where the enterprise, use case or domain context is important.
  • Generalization vs. Specialization: While off the shelf models are very good at generating general responses across a wide range of topics, they may lack the depth and accuracy of a specialist in a specific field. This can be a limitation for highly technical or niche subjects where expert knowledge is crucial.
  • Lack of Real-Time Knowledge: LLMs are mostly trained on the internet data, and data available in public domain… these datasets that are not updated often and are certainly not updated in real-time. As a result, they lack knowledge of events or developments that occurred after their last training update. You will often see a response saying ‘as of Jan 2024…’ which indicated that last date of the training data.
  • Bias in Training Data: The bias or inaccuracies in the training data will be retained in the models, therefore the output will contain the same bias, inappropriate or inaccurate content. There is awareness and effort to mitigate these issues but it is far from addressed.
  • Limited Understanding of User Intent: While models can generate responses based on the text input they receive, they might not always accurately grasp the user's intent, especially if the query is ambiguous or complex. This can lead to irrelevant or off-target responses.
  • Limited EQ: Clearly models do not have emotions, consciousness, or subjective experiences. Any expression of emotion or empathy is simulated based on patterns in the training data and not rooted in actual feelings or experiences.
  • Garbage In Garbage Out still holds: Like in all traditional AI, the quality of the data impacts the quality of the prediction, in Generative AI the quality and clarity of the input text significantly impacts the quality of the model's responses. Vague or poorly structured questions lead to unclear, inaccurate, or irrelevant answers.
  • The?issue of ‘Synthetic’ Creativity: While large language models can generate content that appears creative or novel, they are essentially recombining and manipulating patterns from their training data. Their "creativity" is bound by the scope of what they've been trained on and not their ability to conceive original ideas.
  • Ethical and Privacy Concerns: The use of large language models raises ethical questions around privacy (especially when processing personal or sensitive information), the potential for misuse (e.g., generating disinformation), and the impact on employment in certain sectors.
  • Resource Intensity: Training and running large language models require significant compute power, contributing to high energy consumption and pretty significant environmental impact too.

?

Technical terms for limitations

  • Hallucination - This is a common issue faced by LLMs where the model generates factually incorrect data for the given context. This is usually due to the fact that the model is generating some text based on its generic training, and is not familiar with the business vocabulary, unfamiliar question patterns, outdated training knowledge, lack of local knowledge private to the organization etc.
  • Knowledge Cut-Off: The model’s knowledge remains static and frozen to the time and scope it has been trained to.
  • Black box in the context of LLMs refers to the opaque inner workings of these models, making it challenging to understand how they reach conclusions.?
  • Explainability: Related to the above, as the term suggests, explainability of a model determines how best it can explain how, why and from where a particular output was generated. This is a very complex issue in Gen AI, and is the foundation for trustability and adoption of models. Techniques like counterfactual approximation methods and intervention-based methods are also being explored to enhance interpretability and trust in LLMs
  • Token limits: A token is a unit of text that is used to represent a word or phrase. Token limits are relevant because they can affect the performance of LLMs. If the token limit is too low, the LLM may not be able to generate the desired output. For example, if you are trying to generate a 1000-word document but the token limit is 1000, the LLM will only be able to generate the first 1000 tokens. Whereas if it’s too high, the LLM is going to be very slow and require very high computational power.Here is a table of some LLMs and their token limits and weaknesses.

The Solution

Gen AI applications leveraging LLMs are directly impacted in terms of accuracy of response, cost of implementation, and trustability. There are three high level prominent techniques to address these limitations, and then there are variations of these techniques.

The prominent techniques are:

Author created


  • Finetuning: Finetuning is the process of adding domain specific data and training to a pre-trained model to achieve specific tasks, to improve accuracy. However adding more data and training is naturally more expensive. Therefore there are other variations. However, it still suffers from knowledge cutoff and even hallucination to some extent. That’s where Prompt engineering comes in.

  • Prompt Engineering: Prompting is a technique of guiding a a response from an LLM by refining the inputs supplied. Prompting methods can vary from simple phrases to detailed instructions based on task requirements and model capability. The approach of designing and optimizing prompts is called prompt engineering. But prompt engineering alone cannot address the core issue of not having the required business or local enterprise knowledge. That’s where the next technique comes in.

  • Retrieval Augmented Generation (RAG): RAG brings in the power of retrieving context from relevant data sources and combines with overall prompting strategy to generate contextually accurate responses grounded on facts. Essentially RAG technique enables the model to lookup external information to improve the response generation. This is an extremely powerful capability where the model can actively refer to proprietary or internal knowledge bases. RAG techniques minimize hallucinations, is time relevant, transparent in terms of sourcing of information, and relatively cost effective.

Choosing the technique right for you

In summary, the use case/s will determine which technique is appropriate, and the importance of factors such as:

·?????? Importance of date / time sensitivity

·?????? Importance of domain specific terminology, vocabulary

·?????? Importance of traceability, transparency, reliability

·?????? Importance of reducing Hallucinations or inaccurate answers


Implementation considerations

In addition to these, there are considerations such as:

  • Cost of training, fine tuning, testing
  • Complexity of implementation
  • Penalty for errors
  • Environmental implications
  • Ethical considerations

The cost and complexity is a material difference:

In many cases, it will be a combination of three techniques:

·??????? Finetuning: Bringing in the understanding of the business and local contextual intelligence.

·??????? Prompt Engineering: Ensuring prompts are optimized to help the model understand the intent and context of the request.

·??????? RAG: Enhancing with factual, time relevant and traceable information.

Here is a visual on the comparable techniques to help.

Author created


References

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (https://arxiv.org/abs/2005.11401


Coming up Next

  1. AI-griculture - the role of AI in Agriculture
  2. AI in Cybersecurity | Auth0/Okta
  3. The rise of Small Language Models
  4. Where is Apple? Slow and steady wins the race?
  5. The environmental cost of AI
  6. Decoding ML, DL, LLM, AGI, and the world of AI
  7. Rhythms | A new AI powered operating systems to transform the future of work?
  8. The Chip War




Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

8 个月

It's crucial to delve into the nuances of LLMs, understanding their performance variations, limitations, and the necessity for optimizations. Considering the complexity of language models and their diverse applications, what specific challenges do you think arise in fine-tuning and implementing optimization techniques like prompt engineering and retrieval-augmented generation?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了