LLMs: Is bigger always better? Small LLMs are punching above their weight.?
Pawel Sobczak
VP Partnerships | ?? ex-IBM VP EMEA | AI startup strategic advisor | Empowering AI builders to boost productivity | Trustworthy AI for Business | Startups | ISVs
Large Language Models (LLMs) have taken the world by storm, with names like GPT-4 by OpenAI , LLaMA2 by Meta , Jurassic-1 Jumbo by AI21 Labs , or Gemini (previously know as Bard) by Google DeepMind , dominating headlines. But are these behemoths always the best choice? In enterprise environment there are more factors to consider than versatility. There are smaller, more focused language models like Granite 13B by IBM , Mistral 7B by Mistral AI , or Flan-T5 3B by Hugging Face .?
LLMs: Powerhouses with potential pitfalls
Usually they provide better results, but costs of use are surprisingly high for those who move from pilots and tests to production inference.
Smaller Models: Faster, cheaper, more explainable
Smaller models provide better transparency and trustworthiness, some vendors like IBM for its #Granite model provide indemnification based on confidence they have in quality of data used to train the family of foundation models.
领英推荐
Some are mixing several small LLMs to meet benchmarks of very large models, while maintaining speed and cost efficiency of smaller ones, for example Mixtral 8x7B. The verification what is the best option is still to come.
So, which model is right for you?
The answer depends on your specific needs:
Choosing the right LLM is all about understanding your needs and priorities. Don't get caught up in the hype – explore both options and find the model that empowers you to achieve your goals. Pay attention to total cost of ownership (model, infrastructure, skills) and compliance/risks when using Generative AI in business.
?What are your experiences with LLMs and smaller models? Share your thoughts in the comments.
#LLMs #AI #NLP #MachineLearning #DataScience #Startups #Tech #ibm #mistral #gemini #GenAI #watsonx #governance
Head of Marketing Operations, Mobile Networks
1 年Thank you for sharing - I especially agree with the claim that smaller models may be better for specialized tasks. I can easily imagine areas of business in which I would explicitly want my model to NOT be trained on some data categories that may often be used for the big ones.