Noob looking for guidance on LLM selection

Noob looking for guidance on LLM selection

Today, I came across a common question on Reddit and shared my response. I'm posting it here to reach a broader audience, as I believe many beginners have similar questions.

Question on r/LLMDev subreddit:

Hello, I'm making a project where every user has 10k input tokens and 400 output tokens worth of interaction at least 200 times a month. The project is for general use(Like general knowledge question, or generating mathematical questions). Basically, it won't be much related to programming so IK Claude isn't the best option.

[Read on Reddit]

My response:

To approach this systematically, we can evaluate the options across three key dimensions:

Dimension 1: Model Complexity

For your use case—handling general knowledge queries and generating mathematical questions—domain-specific expertise isn’t required. Any general-purpose LLM with 7B-13B parameters should suffice. Models like GPT-4 (by OpenAI), or similar alternatives from providers such as Cohere, Anthropic (Claude), or Mistral could work. In general, larger models (e.g., 27B or 70B) often provide higher-quality results but come at increased costs. The question you should be asking is if you REALLY need the best performing model (e.g., 70B)? Let's dive a bit deeper by thinking about quality.

Dimension 2: Quality

Quality depends on your project’s specific needs. If precise and nuanced answers are essential, GPT-4 or Claude might be better choices, but they cost more. If you can tolerate slightly less sophistication, models like Llama 3 (no offense to LLaMMA fans :-) ) or other open source models such as Falcon provide good performance at a lower cost, especially when hosted locally or through cost-efficient APIs.

Bottom line is that while smaller models (7B-13B) are cost-effective, larger models tend to produce higher-quality, nuanced outputs. It’s a good idea to experiment with smaller models first to determine if they meet your quality requirements. They offer the advantages of lower costs and better latency, making them a practical starting point."

Dimension 3: Cost

Cost plays a pivotal role in API/LLM selection.

Let's estimate cost of using different LLM available as a service - based on your requirements:

Input tokens = 10k

Output tokens = 400

Number of calls = 200

NOTE:

  • Do your own price calculation - I don't know about the accuracy of the website I used for generating these comparative pricing.
  • Multiple cost x (number of users)
  • Don't forget to factor in the development/QA cost
  • Self hosted LLMs will require infrastructure which by the way is not cheap :-)


Cost comparison for commercial LLM options

In your particular scenario, the cost is relatively low, so I’d recommend going for the best option. ??

That said, keep in mind that costs for real-world applications can be significantly higher. To provide a complete picture, here are some suggestions for cost optimization


Cost-Optimization Tips

Here are some strategies to reduce costs without compromising too much on quality:

  1. Fine-tune smaller models: Train a smaller model to specialize in your specific queries.
  2. Hybrid approach: Use larger models only for complex queries while leveraging smaller ones for routine tasks.
  3. Context optimization: Use vector databases (e.g., Pinecone) with LangChain to minimize input token usage by feeding only relevant data to the model.

My 2 Cents

If you’re new to the world of LLMs, making these decisions can be daunting. A structured course on LLMs can help you navigate these options more effectively and avoid common pitfalls. If you’re interested, check out my course designed specifically for beginners—it provides actionable guidance and helps you get up to speed quickly.

https://youtu.be/Tl9bxfR-2hk

Govind Moghekar

Founder at SuperAI Labs | Turning AI into Impactful Solutions ?? | Honored to Receive the Prestigious Eleven Labs Grant | Forbes India DGEMS 2024: Select 200 Nominee ??

3 个月

Insightful

回复
Arvind Dayal

Hands-on Technology Leader : ML, AI, Serverless, Cloud, Big Data, SaaS

3 个月

In our recent AI project, we went through a similar exercise and reached the same conclusion. It really boils down to the classic trade-off of Cost, Quality, and Speed—where increasing complexity tends to slow response times and often drives up costs. The key challenge is finding that sweet spot where all these factors align. Ultimately, assessing the ROI of the solution you're building becomes crucial. You need to determine whether the benefits in quality or speed are worth the additional investment. In our case, we discovered that by optimizing our prompts and refining our processes, we were able to significantly improve speed while keeping costs in check. This allowed us to maintain a reasonable level of quality without compromising too much on performance or budget. It's a balancing act: enhance the solution where it matters most while being mindful of the overall impact on efficiency and cost. The ability to adjust and iterate, especially with AI, allows for a more flexible approach that maximizes value without sacrificing the end user experience.

要查看或添加评论,请登录

Rajeev Sakhuja的更多文章

社区洞察

其他会员也浏览了