Noob looking for guidance on LLM selection
Today, I came across a common question on Reddit and shared my response. I'm posting it here to reach a broader audience, as I believe many beginners have similar questions.
Question on r/LLMDev subreddit:
Hello, I'm making a project where every user has 10k input tokens and 400 output tokens worth of interaction at least 200 times a month. The project is for general use(Like general knowledge question, or generating mathematical questions). Basically, it won't be much related to programming so IK Claude isn't the best option.
[Read on Reddit]
My response:
To approach this systematically, we can evaluate the options across three key dimensions:
Dimension 1: Model Complexity
For your use case—handling general knowledge queries and generating mathematical questions—domain-specific expertise isn’t required. Any general-purpose LLM with 7B-13B parameters should suffice. Models like GPT-4 (by OpenAI), or similar alternatives from providers such as Cohere, Anthropic (Claude), or Mistral could work. In general, larger models (e.g., 27B or 70B) often provide higher-quality results but come at increased costs. The question you should be asking is if you REALLY need the best performing model (e.g., 70B)? Let's dive a bit deeper by thinking about quality.
Dimension 2: Quality
Quality depends on your project’s specific needs. If precise and nuanced answers are essential, GPT-4 or Claude might be better choices, but they cost more. If you can tolerate slightly less sophistication, models like Llama 3 (no offense to LLaMMA fans :-) ) or other open source models such as Falcon provide good performance at a lower cost, especially when hosted locally or through cost-efficient APIs.
Bottom line is that while smaller models (7B-13B) are cost-effective, larger models tend to produce higher-quality, nuanced outputs. It’s a good idea to experiment with smaller models first to determine if they meet your quality requirements. They offer the advantages of lower costs and better latency, making them a practical starting point."
Dimension 3: Cost
Cost plays a pivotal role in API/LLM selection.
Let's estimate cost of using different LLM available as a service - based on your requirements:
Input tokens = 10k
领英推荐
Output tokens = 400
Number of calls = 200
NOTE:
In your particular scenario, the cost is relatively low, so I’d recommend going for the best option. ??
That said, keep in mind that costs for real-world applications can be significantly higher. To provide a complete picture, here are some suggestions for cost optimization
Cost-Optimization Tips
Here are some strategies to reduce costs without compromising too much on quality:
My 2 Cents
If you’re new to the world of LLMs, making these decisions can be daunting. A structured course on LLMs can help you navigate these options more effectively and avoid common pitfalls. If you’re interested, check out my course designed specifically for beginners—it provides actionable guidance and helps you get up to speed quickly.
Founder at SuperAI Labs | Turning AI into Impactful Solutions ?? | Honored to Receive the Prestigious Eleven Labs Grant | Forbes India DGEMS 2024: Select 200 Nominee ??
3 个月Insightful
Hands-on Technology Leader : ML, AI, Serverless, Cloud, Big Data, SaaS
3 个月In our recent AI project, we went through a similar exercise and reached the same conclusion. It really boils down to the classic trade-off of Cost, Quality, and Speed—where increasing complexity tends to slow response times and often drives up costs. The key challenge is finding that sweet spot where all these factors align. Ultimately, assessing the ROI of the solution you're building becomes crucial. You need to determine whether the benefits in quality or speed are worth the additional investment. In our case, we discovered that by optimizing our prompts and refining our processes, we were able to significantly improve speed while keeping costs in check. This allowed us to maintain a reasonable level of quality without compromising too much on performance or budget. It's a balancing act: enhance the solution where it matters most while being mindful of the overall impact on efficiency and cost. The ability to adjust and iterate, especially with AI, allows for a more flexible approach that maximizes value without sacrificing the end user experience.