登录查看更多内容

A Guide: Choosing The Perfect Language Model For Your Use Case

Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

发布日期: 2024年4月2日

The landscape of Large Language Models (LLMs) is rapidly changing with many models emerging, but which model is suitable for our specific task?

This project will guide you in choosing an LLM for your specific needs using criteria and public resources.

TL;DR

Consider 3 key factors: 1) task performance, 2) computational efficiency, and 3) commercial terms when choosing a model
Use publicly available comparisons such as leaderboards and metrics as actionable insights
Make a prototype of an auto-coding app on a suitable LLM

LLM Usage Types

Among the 5 deployment types, we will focus on the common approaches particularly relevant to app development: Scope 1 and Scope 3:

5 Deployment Types of LLM - Ref. AWS Generative AI Security Scoping Matrix

Key Factors to Consider

Once we have defined the task scope, here are 3 key factors to consider when choosing a model.

1. Task Performance

Key areas to consider for successful task completion are here:

Accuracy: Percentage of correct answers/completions
Domain Closeness: Relevance of the training data to our task domain
Fluency: Readability and naturalness of generated text
Informativeness: How well the response addresses the prompt or question
Engagement: How well the chatbot keeps the user interested
Robustness: How well the model can handle unexpected inputs
Safety & Fairness: Avoiding harmful or offensive outputs

Using technical benchmarks allows us to quantify the task performance.

Major metrics by task are here:

< Coding Tasks >

HumanEval: Probability of returning the correct code out of the 164 coding challenges using pass@k metrics
MBPP (Mostly Basic Python Problems): Probability of returning the correct code out of 1K Python programming challenges

< Chatbot Assistance >

MT-Bench (Multi-turn Benchmark): Conversation flow and instruction-following capabilities of the model
TruthfulQA: Probability of generating incorrect information out of 817 questions across the 38 categories
Context Window: The number of tokens the model can take as input when generating responses

< Reasoning Tasks >

ARC (AI2 Reasoning Challenge): Deeper knowledge & reasoning evaluation using 7.5K questions
HellaSwag: Probability to complete sentences in correct ways using a single selection test

< Question Answering and Language Understanding >

MMLU (Multitask Multidomain Language Understanding): Broad knowledge evaluation on the 57 diverse subjects
TriviaQA: Reading comprehension using over 650K question-answer-evidence triples

2. Computational Efficiency

Assess if the model can run efficiently while aligning with our environment's capabilities to avoid performance bottlenecks.

Parameter Size: Larger models with more parameters generally need more resources. Smaller models offer faster processing but may compromise accuracy.
Inference Speed: How quickly the model can generate output can impact efficiency.
Hardware: The type of hardware (CPU vs GPU) impacts how efficiently the model runs.

3. Commercial Terms

Choose a model that fits our long-term business goals & technical evolvement.

Usage Price & Limits: Price per million tokens, account usage limits set by the model provider, and data server cost to run the model
Ecosystem: Consider ecosystem and deployment flexibilities with platformers such as Hugging Face, Azure ML, or Amazon SageMaker.
Commercial Availability: If we plan to generate revenue from the project, ensure the chosen LLM has commercial licensing options.

Leveraging Public Resources

Evaluating LLMs from scratch can be a time-intensive process. Fortunately, there are valuable public resources available to streamline your selection process:

Leaderboard & Metrics Comparison

Commercial Terms

A good summary of the price comparison among major LLMs

Danny Butvinik 1 年前

Adaptive-RAG: Learning to Adapt…

Snigdha Kakkar 3 个月前

Unveiling LLMops: Your Gateway to Efficient Large…

Sanjay Kumar MBA,MS,PhD 7 个月前

Credit: Philipp Schmid / Providers covered:

Broader Search Based on Technical Specs and Task Objectives

Example: Build an Auto-coding App

Let's explore how to choose the right LLM for building apps, using auto-coding applications as a practical example.

Step 1. Define task and performance metrics

Task: Text-to-text generation (Understand text inputs and return Python source code in text format)
Use HumanEval as a technical parameter

Step 2. Choose a model using public resources

Refer to the HuggingFace leaderboard focusing on Python coding tasks and choose CodeLlama 13B;

Achieve a high score on the leaderboard (as of Mar 2024)
Expect an efficiency advantage with the smallest model size among the top 3 contenders
Offers serverless deployment options for easy integration
Available for commercial use

Compare LLMs by task completion score (HuggingFace / As of March 2024)

Step 3. Deployment & Result

Use an inference API to deploy an app:

Result:

Conclusion

Future of LLMs - Niche Domination or Universal Powerhouse..?

The guideline enables us to choose an LLM that excels at our target tasks while aligning with our resource limitations and commercial requirements. We can focus on our needs over the general popularity of the model, leveraging public resources.

On the other hand, the rapid advancement of AI technology suggests a shift towards dominance by a smaller number of highly competitive models - either universally or domain-specifically. To navigate the dynamic LLM landscape, we need to:

Mind the Metrics: Evaluation results can vary depending on the provider's methodology and testing environment. Consider hands-on testing of a few potential LLMs for a more practical understanding.
Revisit Initial Choice: As new models and evaluation methods emerge regularly, be prepared to adapt our choice as technology advances.

In the next article, we will explore LLMs' architecture, encoder & decoder, and deploy a chatbot.

Reference:

Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond

Magicoder: Source Code Is All You Need

Bias Testing and Mitigation in LLM-based Code Generation

Hugging Face CodeLlama 13b Python (Model Card)

Use Hugging Face with Amazon SageMaker

Exploring LLM Platforms and Models: Unpacking OpenAI, Azure, and Hugging Face

A Guide: Choosing The Perfect Language Model For Your Use Case

Kuriko I.

Founder & CEO @ version | AI Engineering | INSEAD MBA

TL;DR

LLM Usage Types

Key Factors to Consider

1. Task Performance

Using technical benchmarks allows us to quantify the task performance.

2. Computational Efficiency

3. Commercial Terms

Leveraging Public Resources

Leaderboard & Metrics Comparison

Commercial Terms

领英推荐

Broader Search Based on Technical Specs and Task Objectives

Example: Build an Auto-coding App

Step 1. Define task and performance metrics

Step 2. Choose a model using public resources

Step 3. Deployment & Result

Conclusion

Future of LLMs - Niche Domination or Universal Powerhouse..?

更多精彩文章

社区洞察

其他会员也浏览了

Introducing HaluMon: Ensuring Language Model Reliability

Exploring LangChain's Expression Language (LCEL)

Fine-Tuning LLMs with Your Data

A Guide to Training Your Own Language Model

Large Language Models - part 2

The Definitive Guide to Open Source Large Language Models (LLMs)

Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Top LLM Papers of the week (February 2024 Week 4)

Why Large Language Models will replace Apps

TL;DR

LLM Usage Types

Key Factors to Consider

1. Task Performance

Using technical benchmarks allows us to quantify the task performance.

2. Computational Efficiency

3. Commercial Terms

Leveraging Public Resources

Leaderboard & Metrics Comparison

Commercial Terms

领英推荐

Broader Search Based on Technical Specs and Task Objectives

Example: Build an Auto-coding App

Step 1. Define task and performance metrics

Step 2. Choose a model using public resources

Step 3. Deployment & Result

Conclusion

Future of LLMs - Niche Domination or Universal Powerhouse..?

Consumer Sentiment Analysis Using Machine Learning Algorithm

2024年6月12日

How Disney+ Scaled to 150 Million Subscribers - Tech Edition

2024年5月13日

Hello World - Machine Learning & Neural Network

2024年4月29日

NLP Application - Building AI Chatbot Using Transformer Models and LangChain

2024年4月16日

AI for Business Intelligence - Fine-tuning Large Language Model (LLM)

2024年3月27日

Stock Price Prediction Using Deep Learning - LSTM Network

2024年3月20日

社区洞察

其他会员也浏览了

Introducing HaluMon: Ensuring Language Model Reliability

Exploring LangChain's Expression Language (LCEL)

Fine-Tuning LLMs with Your Data

A Guide to Training Your Own Language Model

Large Language Models - part 2

The Definitive Guide to Open Source Large Language Models (LLMs)

Paper Review: Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models

Top LLM Papers of the week (February 2024 Week 4)

Why Large Language Models will replace Apps