10 differences between small language models (SLM) and large language models (LLMs) for enterprise AI

Kane Simms

AI and CX Transformation ?? Helping business leaders and practitioners leverage AI… Properly.

发布日期: 2024年6月14日

However, when it comes to enterprise AI, bigger isn’t always better. It’s true that large language models have some brilliant capabilities, but do you really need a large language model for your use cases? Perhaps a small language model will do.

With all this talk about large language models, you’d be forgiven for thinking that they’re going to solve the world’s problems. All we need is more data and more computing power!

UNPARSED is the world’s most loved conversational AI conference. We'll be bringing over 400 AI practitioners together to share knowledge and learning around best practice and case studies for how to design and develop effective AI solutions and agents for businesses and products!

Join us in person or virtually to gain insights from leading voices in this community.

Save 50% with promo code VUXWORLD.

Find out more

What is a small language model (SLM)?

A small language model is an AI model, similar to a large language model, only with less training data and less parameters. They fundamentally do the same thing as a large language model; understand and generate language, but are smaller and less complex.

How big is a small language model?

Small language models come in a variety of shaped and sizes and the definition of when a model becomes a large language model differ depending on who you ask. Typically, though, anything below 30 billion parameters is considered a small language model. However, SMLs can be as small as a few hundred million parameters.

What’s the difference between small language model (SLM) and a large language model (LLM)?

There are 10 primary differences between the two that will help you understand which type of model you might consider for a given use case:

Size. This is obvious. As mentioned above, LLMs are a lot larger than SLMs. Some of the more recent LLMs such as Claude 3 and Olympus, have 2 trillion parameters! Compare that with Phi-2 at 2.7 billion.
Training data. LLMs require extensive, varied data sets for broad learning requirements. SLMs use more specialist and focused, smaller data sets.
Training time. To train an LLM, it can take months. SLMs can be trained in weeks.
Computing power and resources. Because of the large data sets and parameter sizes, LLMs consume a LOT of computing resource to train and run the models. SLMs use far less (still a lot, but less), making them a more sustainable option.
Proficiency. LLMs are typically more proficient at handling complex, sophisticated and general tasks. SLMs are best for more adequate, simpler tasks.
Adaptation. LLMs are harder to adapt to customised tasks and require heavy lifting for things like fine tuning. SLMs are much easier to fine tune and customise for specific needs.
Inference. LLMs require specialised hardware, like GPUs, and cloud services to conduct inference. This means they have to be used over the internet. SLMs are so small, they can be ran locally on a raspberry pi or a phone, meaning they can run without an internet connection.
Latency. If anyone’s tried building a voice assistant with an LLM, then you’ll know that latency is a huge issue. Depending on the task, you’re waiting seconds for LLMs to respond. SLMs, because of their size, are typically much quicker.
Cost. Inevitably, if you’re having to consume a lot of computing resource for inference, and your model size is bigger, it means that the token cost for LLMs is high. For SLMs, it’s a lot lower, meaning they’re cheaper to run.
Control. With LLMs, you’re in the hands of the model builders. If the model changes, you’ll have drift or worse, catastrophic forgetting. With SLMs, anyone can literally run them on your own servers, tune them, then freeze them in time, so that they never change.

How to decide what sized model to use

To decide what mode to to use, first start experimenting with large language models. This is to validate that the task you’re trying to accomplish can, in fact, be done. If it can be done at all, an LLM should be able to do it.

Once you’ve proven that the task is doable, you can then start working down in model sizes to figure out whether the same task can be done using a smaller model. When you reach a model size where your results start to change, get less accurate or slightly more unpredictable, you’ve reached your potential model size.

That doesn’t necessarily mean you should go back up in model size. It may mean that the model size you’ve reached requires some further tuning or training.

Prompt tuning

This tuning can be done firstly with prompt tuning. This is specifically to provide the model with some in-context learning i.e. data that it can use to accomplish the task, delivered to it in the prompt.

Brij kishore Pandey 1 周前

?? Promptpack: How to build a second-brain (featuring…

Azeem Azhar 1 年前

How to Bypass GPTZero: 12 Proven Techniques to Beat AI…

Parul Gautam 3 个月前

Retrieval augmented generation

Second, consider retrieval augmented generation (RAG) or indexing. This is to provide the model with external data that is can use at runtime to pull into its responses. For some use cases, especially those that are search-based of some kind, you’ll find this may give you the results you’re looking for and is easier than fine tuning as it doesn’t require tampering with model weights or access to the raw model.

Fine tuning

Lastly, if the first two options haven’t solved your problem, then fine tuning is the final consideration. This is where you train the model for a specific task based on data related to that specific task. There are a number of different types of fine tuning methods you can use, ranging from fine tuning the output using embeddings all the way through to fine tuning the parameters of the models themselves. To do this, you need access to the raw model, rather than an API, and so you’re typically heading into small language model territory here. Find out more about fine tuning.

Relevance for enterprise teams

Think about the tasks that you have for AI within your organisation. It’s probably something along the lines of:

Intent classification
Knowledge retrieval
Content summarisation
Sentiment analysis
Conversation management
Contextual response generation
Translation

And things like that. Of course, there are many more use cases for AI, but these are among the most common.

Now consider whether you need the intense power of a large language model for these tasks.

Classification? Really? You need all the internet’s information to be able to recognise that ‘my credit card was stolen’ means ‘stolen card’?

The vast majority of enterprise AI requirements are specific to that enterprise. You more than likely don’t need the most powerful AI tools on the planet to do what you want (and you certainly don’t need the cost).

Try working backwards from a large language model and see whether a small language model is more fit for your purpose.

About Kane Simms

Kane Simms is the front door to the world of AI-powered customer experience, helping business leaders and teams understand why AI technologies are revolutionising the way businesses operate.

He's a Harvard Business Review-published thought-leader, a LinkedIn 'Top Voice' for both Artificial Intelligence and Customer Experience, who helps executives formulate the future of customer experience ad business automation strategies.

His consultancy, VUX World, helps businesses formulate business improvement strategies, through designing, building and implementing revolutionary products and services built on AI technologies.

Subscribe to VUX WORLD newsletter
Listen to the VUX World podcast on Apple, Spotify or wherever you get your podcasts
Take our free conversational AI maturity assessment

Conversational AI & NLP

18,467 位关注者

Mohan Jayabal

1 周

Good Article.

1 次回应

Ganesan LS

IT ADVISORY & STRATEGIC CONSULTING

3 周

Awesome

ANKUR GUPTA

Technical Manager at MediaTek|| Ex-Nokia

3 个月

Nice article.. what is the base architecture for SLM model like LLM deployed using transformer architecture mostly

Sam Nanji

Digital Transformation & Customer Experience Leader ? Founder ? NED ? Voice Skills ? KM Expert

3 个月

Nice article I think we will see a move towards industry specific built and tuned SLM’s that can run natively on device without internet access. Imagine an SLM built and tuned for banking and another for automotive etc. Also I can envisage a scenario where the SLM handles determining the intent (with additional parameters) and this is then used to look up information in a KMS.

4 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

10 differences between small language models (SLM) and large language models (LLMs) for enterprise AI

Kane Simms

AI and CX Transformation ?? Helping business leaders and practitioners leverage AI… Properly.

What is a small language model (SLM)?

How big is a small language model?

What’s the difference between small language model (SLM) and a large language model (LLM)?

How to decide what sized model to use

Prompt tuning

领英推荐

Retrieval augmented generation

Fine tuning

Relevance for enterprise teams

Conversational AI & NLP

18,467 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Small Language Models: The Unsung Heroes of AI

AI: three predictions for 2024

Customizing Large Language Models (LLM) at Focus Unmatched Expertise in AI

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Build Your Own AI Tool With Google Gemma

What is a LLM? The Key to Next-Level Growth for SMBs

Leveraging Generative AI & Language Models for Businesses - How To Build Your Own Large Language Model

The Limits of Large Language Models: Why They Aren't AGI:

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Large Action Models(LAM): Ushering in a New Era of AI Autonomy

What is a small language model (SLM)?

How big is a small language model?

What’s the difference between small language model (SLM) and a large language model (LLM)?

How to decide what sized model to use

Prompt tuning

领英推荐

Retrieval augmented generation

Fine tuning

Relevance for enterprise teams

Conversational AI & NLP

18,467 位关注者

When AI picks up the phone: How to manage AI agents at scale

2024年10月7日

Is conversation design dead? ??

2024年10月4日

Scaling your AI solution for long term success

2024年10月2日

AI reduces call volumes by 40% for Tallahassee State College

2024年9月29日

Not another newsletter about OpenAI drama ??

2024年9月27日

Why NPS and CSAT don’t work for measuring your AI efforts

2024年9月24日

Exploring the Evolution of AI in Search and Contact Centers: Insights from Google’s Vlad Vuskovic

2024年9月20日

The revolution in AI processors with Robert Hallock at Intel Corporation

2024年9月19日

Agony Agent Roundtable #2

2024年9月18日

Make sure your AI policy includes this...

2024年9月12日

社区洞察

其他会员也浏览了

Small Language Models: The Unsung Heroes of AI

AI: three predictions for 2024

Customizing Large Language Models (LLM) at Focus Unmatched Expertise in AI

The Art & Science of AI Whispering: Mastering Prompt Engineering for Enterprises in the Age of Language Models

Build Your Own AI Tool With Google Gemma

What is a LLM? The Key to Next-Level Growth for SMBs

Leveraging Generative AI & Language Models for Businesses - How To Build Your Own Large Language Model

The Limits of Large Language Models: Why They Aren't AGI:

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Large Action Models(LAM): Ushering in a New Era of AI Autonomy