登录查看更多内容

How To Choose The Right LLM?

Alex Wang

Learn AI Together - I share my learning journey into AI & Data Science here, 90% buzzword-free. Follow me and let's grow together!

发布日期: 2024年3月26日

With the vast number of large language models available, it's not easy to decide which one to choose for running a particular task. Different models are trained on different data and have varying parameter counts. Picking the wrong model can have severe and unwanted impacts, such as biases originating from the training data or hallucinations that are simply incorrect.

You might think that choosing the largest model is the way to go due to its power, but larger models often come with higher computing costs, complexity, and variability. A better approach is to pick the right-sized model for the specific use case you have.

Here are the factors that you need to consider during the decision-making:

Step1. Clearly define your use case

What exactly are you planning to use the foundation model for?

Example: I need the AI to write personalized emails for my marketing campaign.

Step2. Make a simple comparison chart to evaluate your options

List the available model options, including their size, costs, performance, risks, and deployment methods.

*The model cards can be quite helpful for this step. It helps you to understand if the model has been trained on data specifically for your purposes. There’s a higher chance of getting more desired results from a model that has been fine-tuned for specific use cases (e.g. sentiment analysis, document summarization, text generation.) similar to yours.

When the purposes align with what you need, it may perform better when processing the prompts and enable you to use zero-shot prompting to get the desired results! This means you can simply ask the model to perform tasks without having to provide multiple completed examples first. (Sounds perfect!)

Step 3. Evaluate model characteristics for your specific use case to find the model that provides the most value

Run tests and evaluate options based on your previously identified use case and deployment needs.

Let's continue with our example, we are trying to find the best LLM for for the marketing campaign.

Three factors that we need to consider carefully are accuracy, reliability, and speed.

Accuracy - how close the generated output is to the desired output; accuracy can be measured objectively and repeatedly by choosing evaluation metrics that are relevant to our use cases. (e.g., BLEU - Bilingual Evaluation Understudy benchmark, is a suitable metric for indicating the quality of the generated translations.)

领英推荐

Introduction to Perplexity AI

Blockchain Council 8 个月前

LLM Evaluation: Metrics, Frameworks and Best Practices

Dr. Rabi Prasad Padhy 5 个月前

Top RAG Papers of the Week (October Week 2, 2024)

Kalyan KS 5 个月前

Reliability - includes several factors like consistency, explainability, trustworthiness, and avoiding toxicity (like hate speech).

Ultimately, it comes down to trust, and trust is built through transparency and traceability of the training data, accuracy, and reliability of the output.

Speed - how quickly a user gets a response to a submitted prompt.

As you can imagine, speed and accuracy are often a trade-off. Larger models are usually slower but deliver more accurate answers; smaller models may be faster with minimal differences in accuracy compared to larger models.

It’s all about finding the sweet spot between performance, speed, and cost.

If you consider other additional benefits the model might deliver, like lower latency and greater transparency into the model inputs and outputs, you may find a smaller, less expensive model to be more preferable, even if it may not offer performance or accuracy metrics on par with a more expensive one.

The best way to find out is to simply select the model that’s likely to deliver the desired output and test it. Test the model with your prompts to see if it works, and then assess the model's performance and the quality of the output using metrics.

*Last but not least: consider where and how you want the model and data to be deployed.

Deploying on-premise gives greater control and more security benefits compared to a public cloud environment, but it’s an expensive proposition, especially when factoring in model size, compute power, and the number of GPUs it takes to run a single LLM.

This issue is brought to you in partnership with DeepBrain AI.

DeepBrain AI allows you to generate viral videos with AI Studios' Topic-to-Video.

Simply provide a script, article, or even a website link, and?DeepBrain AI will transform it into a professional-looking video complete with AI-powered avatars! Start here.

Learn AI Together

474,273 位关注者

Prashanth V.

Senior Product Manager @Microsoft | EMBA - ISB | LBS | Wharton

10 个月

While the future might rely more on SLMs, I think one should also think about interoperability across small models to reduce switching costs and ease of maintenance.

AI Insights

11 个月

Great work Alex Wang Now there is direction for all who want to choose the right LLM for their task!

Ken Kondo CSM, CSPO

AI, Software and Data Leader and Innovator

11 个月

Can't wait for Claude 7!

Laura Rodriguez Salvador

Customer Data Activation Manager | Data & AI for Customer Experience @ Inetum | Data Analysis & Data Science | Generative AI

11 个月

Everything revolves around the use case. Include the accuracy measures performed on the use case and not only on standard datasets. They’ll be useful to compare general performance of models on similar scenarios, but not how much *added value* a particular model delivers for the result being seeked. When deploying an AI-powered solution, agreeing on what “success” means for future users is also a tough part of the project, and will determine its success, Happy to receive feedback on this to keep improving how to deliver value using GenAI! ??

1 次回应

Li Sun

Director | MIT | Artificial Intelligence | Blockchain | Venture Capital

12 个月

Hi Alex Wang how do I subscribe to your newsletter. I like how simple yet comprehensive your content on AI is.

查看更多评论

要查看或添加评论，请登录

Alex Wang的更多文章

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

2025年2月10日

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

AI: DeepSeek is Just the Beginning By now, many of us have heard about DeepSeek, China’s homegrown AI model that’s…

57 条评论
AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

2024年12月18日

AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

AI Hardware and Infrastructure NVIDIA unveiled its Blackwell AI chip, a game-changer for computational power in…

36 条评论
Quantum Computing: Where We Stand Today, and Current Applications

2024年12月5日

Quantum Computing: Where We Stand Today, and Current Applications

First of all, what does it mean? 'Quantum' comes from quantum mechanics, the science of how tiny particles like atoms…

74 条评论
How to Make a CPU from Scratch (Fun Warning!)

2024年11月12日

How to Make a CPU from Scratch (Fun Warning!)

Step 1: Grab yourself a rock Okay, not just any rock - what you really need is something like quartz that contains…

40 条评论
AI Agents & Angentic Workflow: Why, How and The Impact

2024年11月1日

AI Agents & Angentic Workflow: Why, How and The Impact

But first of all, why are we so focused on developing AI agents and agentic workflows? Among the many explanations, I…

49 条评论
AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

2024年10月9日

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

The world of AI hardware is expanding rapidly, and if you’re wondering what the deal is with all these processing…

50 条评论
LLM Reasoning: How They Made Models 'Think'?

2024年10月3日

LLM Reasoning: How They Made Models 'Think'?

First off, reasoning in large language models (LLMs) might not be AGI or ASI yet, but it’s a massive leap forward in…

68 条评论
GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

2024年7月31日

GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

Meta's Llama 3.1 Not entirely open-source, but close enough.

38 条评论
What are Prompt Injection Attacks? Wait, is it REAL?

2024年7月24日

What are Prompt Injection Attacks? Wait, is it REAL?

It's not just real; it's a serious AI security concern. Prompt Injection Attacks are a type of security vulnerability…

61 条评论
AI Hardware: CPU vs GPU vs NPU

2024年7月23日

AI Hardware: CPU vs GPU vs NPU

Let’s start with their roles in computing systems: CPU (Central Processing Unit) The CPU, or processor, is the brain of…

66 条评论

See all articles

How To Choose The Right LLM?

Alex Wang

Learn AI Together - I share my learning journey into AI & Data Science here, 90% buzzword-free. Follow me and let's grow together!

领英推荐

Learn AI Together

474,273 位关注者

Alex Wang的更多文章

社区洞察

其他会员也浏览了

The BiCity AI Project Aims to Generate Text And Articles Autonomously

Large Language Models (LLM) Use Cases Examples

How do Voice Bots Handle Languages and Accents?

The Personas of Large Language Models: From Geniuses to Wizards

On closer MQM inspection: Pullet Surprise A GPT4 Exercise in English Proofreading

OpenAI's Whisper: A Symphony of Languages Made Visible

OpenAI Has a Tool For Detecting AI-Generated Text. Why Haven't They Released It?

Voice Search Optimisation in Southeast Asia: Preparing For the Increasing Use of Voice Search in the Region.

The Power of Language in Technology (and why I don't use QA as a verb)

Revolutionising Managed Review: Breaking Language Barriers with LLMs and DiscoveryPartner

领英推荐

Learn AI Together

474,273 位关注者

Alex Wang的更多文章

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

Quantum Computing: Where We Stand Today, and Current Applications

How to Make a CPU from Scratch (Fun Warning!)

AI Agents & Angentic Workflow: Why, How and The Impact

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

LLM Reasoning: How They Made Models 'Think'?

GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

What are Prompt Injection Attacks? Wait, is it REAL?

AI Hardware: CPU vs GPU vs NPU

社区洞察

其他会员也浏览了

The BiCity AI Project Aims to Generate Text And Articles Autonomously

Large Language Models (LLM) Use Cases Examples

How do Voice Bots Handle Languages and Accents?

The Personas of Large Language Models: From Geniuses to Wizards

On closer MQM inspection: Pullet Surprise A GPT4 Exercise in English Proofreading

OpenAI's Whisper: A Symphony of Languages Made Visible

OpenAI Has a Tool For Detecting AI-Generated Text. Why Haven't They Released It?

Voice Search Optimisation in Southeast Asia: Preparing For the Increasing Use of Voice Search in the Region.

The Power of Language in Technology (and why I don't use QA as a verb)

Revolutionising Managed Review: Breaking Language Barriers with LLMs and DiscoveryPartner