Size Isn’t Everything: How Small Language Models Are Changing the AI Landscape
Federico Cesconi
Founder & CEO @sandsiv the number one CXM solution powered by ?? AI | Author | In love with NLP using transformers
In the realm of AI and machine learning, sometimes inspiration strikes from unexpected sources. For Ronen Eldan, a Microsoft Research machine learning expert, a routine bedtime story session with his daughter sparked a groundbreaking idea. Pondering how effortlessly children learn and connect words, Eldan conceived an innovative approach to AI development: What if an AI model could learn using only vocabulary comprehensible to a 4-year-old?
This insight created a new breed of small language models (SLMs) at Microsoft. Despite their compact size, these models boast capabilities that rival and sometimes surpass larger models, making AI more accessible and reducing the computational resources needed.
Microsoft has unveiled the Phi-3 family, its latest series of open models. The standout Phi-3-mini features 3.8 billion parameters but outperforms models twice its size. Available now through Microsoft Azure AI Model Catalog and platforms like Hugging Face and Ollama, Phi-3-mini exemplifies efficiency and capability. Moreover, NVIDIA hosts it as a NIM microservice, demonstrating deployment versatility.
Further expanding this family, Phi-3-small and Phi-3-medium will soon enhance Microsoft's offerings, providing a range of options that balance quality and cost.
These advancements signify a shift from relying solely on large language models (LLMs) to a diversified portfolio where users can choose the optimal model for their needs. Whether running lightweight applications on local devices or integrating AI into areas with limited internet access, SLMs offer a new frontier in AI utility.
Microsoft's approach also emphasizes the importance of high-quality, tailored training data. By leveraging a method akin to creating tiny children's stories, researchers developed the "TinyStories" dataset, enabling these models to generate content with remarkable fluency and grammar. (Here you can get the paper) The subsequent project, "CodeTextbook," utilized high-quality educational content to refine these models further, demonstrating a strategic focus on content quality over quantity.
Microsoft's innovative use of small language models enhances computational efficiency as AI evolves. It promises to expand its potential applications, making technology accessible in even the most remote locations.
This blend of innovation, accessibility, and practical application underscores a significant shift in AI development—bringing powerful tools to a broader audience without extensive infrastructure.
My small test with the model
I ran some specific tests related to Aspec-Based Sentiment Analysis using the smallest of the models (Phi3-mini 128K), and the results are pretty impressive. Let's review them together.
I used a simple sentence I incorporated into a prompt:
"Run an aspect based sentiment analisys of the sentence: The phone is good but the battery doesn't last long. Customer service was not good."
Then, I explained the model and his role in the Natural Language Processing algorithm trained to execute Aspect Sentiment Analysis:
{"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"}
The output is quite good:
{
"aspects": [
{
"aspect": "phone",
"sentiment": "positive"
},
{
"aspect": "battery",
"sentiment": "negative"
},
{
"aspect": "customer service",
"sentiment": "negative"
}
]
}
Now my big question was: does that small model work multi-lingual?
领英推荐
# ABSA GERMAN
messages = [
{"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
{"role": "user", "content": "Run an aspect based sentiment analisys of the sentence: Das Telefon ist gut, aber der Akku h?lt nicht lange durch. Ich habe den Kundendienst angerufen und es ist eine Katastrophe!"},
]
With a big surprise, it is working quite well:
{
"aspect": "Telefon",
"sentiment": "positive",
"aspect": "Akkulaufzeit",
"sentiment": "negative",
"aspect": "Kundendienst",
"sentiment": "negative"
}
What do you think about Italian?
{
"aspects": {
"telefono": {
"sentiment": "positive",
"evidence": "Il telefono è buono"
},
"batteria": {
"sentiment": "negative",
"evidence": "ma la batteria non dura molto"
},
"servizio clienti": {
"sentiment": "negative",
"evidence": "Ho chiamato il servizio clienti ed è un disastro!"
}
}
}
Wow! In Italian, I'm also getting the segment! (the part of the sentence related to the topic and the sentiment)
My colleague then asked me to try in Russian to check if the Cyrillic alphabet plays a role:
# ABSA RUSSIAN
messages = [
{"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
{"role": "user", "content": "Run an aspect based sentiment analisys of the sentence: Телефон хороший, но батарея держится недолго. Я позвонил в службу поддержки, и это катастрофа!"},
]
Again, a solid result:
{
"aspects": {
"telephone": {
"sentiment": "positive",
"evidence": "Телефон хороший"
},
"battery": {
"sentiment": "negative",
"evidence": "батарея держится недолго"
},
"customer service": {
"sentiment": "negative",
"evidence": "это катастрофа!"
}
}
}
What do you think about stressing the model and passing the sentence in Arabic?
# ABSA ARABIC
messages = [
{"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
{"role": "user", "content": "?????? ???? ??? ???????? ?? ???? ??????. ????? ????? ??????? ??? ?????"},
]
The result is less precise than the previous languages but still able to identify two topics and associate the sentiment:
{ "aspect": "????", "sentiment": "positive", "aspect": "??????", "sentiment": "negative" }
It looks at our specific ABSA case, and the Small Language Model works well and provides valid insights.
CONCLUSIONS
After reviewing Microsoft's development of small language models (SLMs), several conclusions can be drawn about their potential future and applicability in fields like Voice of the Customer (VoC) analysis:
?Small language models have a bright future, especially in specialized applications like Voice of the Customer analysis. Their ability to provide actionable insights while being cost-effective and resource-efficient makes them a valuable tool in the evolving landscape of AI technology.