Large language models vs micro models – the great debate

Large language models vs micro models – the great debate

There’s plenty of hype right now around large language models in AI. But are they always the best answer?

Go large or go home

Among the most hyped developments in tech in recent years is the increasing prevalence of AI systems that can understand and generate text, called language models. It’s understandable why — apart from seeming like it has leapt from the pages of a sci-fi novel, natural language processing (NLP) promises plenty of benefits and uses across many sectors. By making it easier for computers to interpret human language (and even talk back) it will make it easier to solve a variety of problems that were once difficult or impossible to tackle with software alone.

Pattr’s Conversational AI, which powers much of what we do, is driven by natural language processing. Our goal is to facilitate healthier, more productive online conversations — and to do that, we need to be able to understand all the weird and wonderful ways people talk to each other on social media. Our?Conversation Health ?system relies on language models to rapidly identify harmful and abusive content on Facebook and Instagram — even when it is obscure enough that it wouldn’t be captured by a basic language filter. To do this, it needs to correctly identify?intent, and not just banned words.

Much of the buzz and hype in the media has been directed at what are called?large language models, or LLMs. To put it simply, large language models are trained on vast amounts of text data — often petabytes worth — and are often tens of gigabytes in size themselves. Many big tech companies, like Meta and Google, have either created or are working on their own LLMs.

Perhaps the most famous example is?GPT-3 , a deep learning model which boasts 175 billion parameters that was released by OpenAI in 2020 and now licensed exclusively to Microsoft. It can generate human-like text and even complete computer code with a short prompt. A New York Times review?described ?GPT-3’s abilities as “amazing”, “spooky” and “humbling”.

Thanks to the vast datasets these large language models use for training, they can be pretty good at recognising input they haven’t explicitly been trained on, and can interpret and respond to a wide variety of scenarios. This ability to accept a huge variety of inputs is part of what makes them seem so ‘spooky’. These singular, gigantic models can be used quite effectively for all sorts of different tasks.

Why might LLMs not be the right solution?

But there are downsides for those intending to use large language models as part of their business operation.

For one, their sheer size and complexity makes them very computationally expensive. (One source in 2020 ?estimated running GPT-3 on a single Amazon Web Services instance to cost at least $87,000 per year.) Having such a massive quantity of data is undeniably useful, but the tradeoff in terms of data and processing load can be substantial.

Vu Ha, a technical director at the AI2 Incubator,?told TechCrunch ?that cost makes LLMs impractical for most uses. “Large models are great for prototyping, building novel proof-of-concepts and assessing technical feasibility,” he said. “An application that processes tweets, Slack messages, emails and such on a regular basis would become cost prohibitive if using GPT-3.”

Cost is also a barrier for companies trying to develop their own large language model. The investment required to train an enormous model like GPT-3 is why only the largest and most well-resourced companies in tech are actually attempting to do it.

The generality of these gigantic language models can also be a burden as well as a help. If you know what tasks you want to use a language model for, it may be more effective to use a more fine-tuned model that is specifically built to handle that task, rather than a large language model which might produce unusual or unexpected outputs from its vast corpus of training data.

Language models that are trained on smaller, more targeted datasets can actually be more effective in practical terms.?A paper published by Google researchers in February ?claimed that a more fine-tuned language model actually outperformed GPT-3 on key benchmarks, especially on so-called ‘zero-shot’ scenarios — meaning situations in which the model must recognise things it hasn’t explicitly seen during training.

Pattr’s hybrid approach

Micro-models, which are trained on much simpler datasets and intended for much narrower purposes, can fill the gaps remarkably well when it comes to practical applications.

Pattr’s Conversation AI leverages both large and micro language models when analysing and categorising high volumes of social media content at scale. But it’s the micro models which are generally more effective and efficient in delivering results and solving our current challenges.?

As part of our Conversational AI, we actually use multiple micro language models operating in tandem to achieve our results. This not only allows us to operate at scale — processing thousands of social media posts and submissions — but to do so relatively cheaply in terms of computational requirements.

There’s no question that LLMs will continue to rapidly develop and impress with their capabilities. But as we’ve found at Pattr, sometimes bigger isn’t always better — and through using leaner, more targeted micro language models we’re enabled to deliver more effective and efficient results.

––

Originally published on www.pattr.io ??

Nikita Bakun

СBDO | Quema | Building scalable and secure IT infrastructures and allocating dedicated IT engineers from our team

1 年

Sebastian, thanks for sharing!

回复

要查看或添加评论,请登录

Sebastian P.的更多文章

社区洞察

其他会员也浏览了