How To Choose The Right LLM?

How To Choose The Right LLM?

With the vast number of large language models available, it's not easy to decide which one to choose for running a particular task. Different models are trained on different data and have varying parameter counts. Picking the wrong model can have severe and unwanted impacts, such as biases originating from the training data or hallucinations that are simply incorrect.

You might think that choosing the largest model is the way to go due to its power, but larger models often come with higher computing costs, complexity, and variability. A better approach is to pick the right-sized model for the specific use case you have.

Here are the factors that you need to consider during the decision-making:

Step1. Clearly define your use case

What exactly are you planning to use the foundation model for?

Example: I need the AI to write personalized emails for my marketing campaign.

Step2. Make a simple comparison chart to evaluate your options

List the available model options, including their size, costs, performance, risks, and deployment methods.

*The model cards can be quite helpful for this step. It helps you to understand if the model has been trained on data specifically for your purposes. There’s a higher chance of getting more desired results from a model that has been fine-tuned for specific use cases (e.g. sentiment analysis, document summarization, text generation.) similar to yours.

When the purposes align with what you need, it may perform better when processing the prompts and enable you to use zero-shot prompting to get the desired results! This means you can simply ask the model to perform tasks without having to provide multiple completed examples first. (Sounds perfect!)

Step 3. Evaluate model characteristics for your specific use case to find the model that provides the most value

Run tests and evaluate options based on your previously identified use case and deployment needs.

Let's continue with our example, we are trying to find the best LLM for for the marketing campaign.

Three factors that we need to consider carefully are accuracy, reliability, and speed.

Accuracy - how close the generated output is to the desired output; accuracy can be measured objectively and repeatedly by choosing evaluation metrics that are relevant to our use cases. (e.g., BLEU - Bilingual Evaluation Understudy benchmark, is a suitable metric for indicating the quality of the generated translations.)

Reliability - includes several factors like consistency, explainability, trustworthiness, and avoiding toxicity (like hate speech).

Ultimately, it comes down to trust, and trust is built through transparency and traceability of the training data, accuracy, and reliability of the output.

Speed - how quickly a user gets a response to a submitted prompt.

As you can imagine, speed and accuracy are often a trade-off. Larger models are usually slower but deliver more accurate answers; smaller models may be faster with minimal differences in accuracy compared to larger models.

It’s all about finding the sweet spot between performance, speed, and cost.

If you consider other additional benefits the model might deliver, like lower latency and greater transparency into the model inputs and outputs, you may find a smaller, less expensive model to be more preferable, even if it may not offer performance or accuracy metrics on par with a more expensive one.

The best way to find out is to simply select the model that’s likely to deliver the desired output and test it. Test the model with your prompts to see if it works, and then assess the model's performance and the quality of the output using metrics.

*Last but not least: consider where and how you want the model and data to be deployed.

Deploying on-premise gives greater control and more security benefits compared to a public cloud environment, but it’s an expensive proposition, especially when factoring in model size, compute power, and the number of GPUs it takes to run a single LLM.

This issue is brought to you in partnership with DeepBrain AI.


DeepBrain AI allows you to generate viral videos with AI Studios' Topic-to-Video.

Simply provide a script, article, or even a website link, and?DeepBrain AI will transform it into a professional-looking video complete with AI-powered avatars! Start here.


Prashanth V.

Senior Product Manager @Microsoft | EMBA - ISB | LBS | Wharton

7 个月

While the future might rely more on SLMs, I think one should also think about interoperability across small models to reduce switching costs and ease of maintenance.

回复

Great work Alex Wang Now there is direction for all who want to choose the right LLM for their task!

回复
Ken Kondo CSM, CSPO

AI, Software and Data Leader and Innovator

8 个月

Can't wait for Claude 7!

回复
Laura Rodriguez Salvador

Customer Data Activation Manager | Data & AI for Customer Experience @ Inetum | Data Analysis & Data Science | Generative AI

8 个月

Everything revolves around the use case. Include the accuracy measures performed on the use case and not only on standard datasets. They’ll be useful to compare general performance of models on similar scenarios, but not how much *added value* a particular model delivers for the result being seeked. When deploying an AI-powered solution, agreeing on what “success” means for future users is also a tough part of the project, and will determine its success, Happy to receive feedback on this to keep improving how to deliver value using GenAI! ??

Li Sun

Director | MIT | Blockchain | Artificial Intelligence | Venture Capital

8 个月

Hi Alex Wang how do I subscribe to your newsletter. I like how simple yet comprehensive your content on AI is.

回复

要查看或添加评论,请登录

Alex Wang的更多文章

社区洞察

其他会员也浏览了