Size Isn’t Everything: How Small Language Models Are Changing the AI Landscape
Image AI created by the Author

Size Isn’t Everything: How Small Language Models Are Changing the AI Landscape


In the realm of AI and machine learning, sometimes inspiration strikes from unexpected sources. For Ronen Eldan, a Microsoft Research machine learning expert, a routine bedtime story session with his daughter sparked a groundbreaking idea. Pondering how effortlessly children learn and connect words, Eldan conceived an innovative approach to AI development: What if an AI model could learn using only vocabulary comprehensible to a 4-year-old?

This insight created a new breed of small language models (SLMs) at Microsoft. Despite their compact size, these models boast capabilities that rival and sometimes surpass larger models, making AI more accessible and reducing the computational resources needed.

Microsoft has unveiled the Phi-3 family, its latest series of open models. The standout Phi-3-mini features 3.8 billion parameters but outperforms models twice its size. Available now through Microsoft Azure AI Model Catalog and platforms like Hugging Face and Ollama, Phi-3-mini exemplifies efficiency and capability. Moreover, NVIDIA hosts it as a NIM microservice, demonstrating deployment versatility.

Further expanding this family, Phi-3-small and Phi-3-medium will soon enhance Microsoft's offerings, providing a range of options that balance quality and cost.

These advancements signify a shift from relying solely on large language models (LLMs) to a diversified portfolio where users can choose the optimal model for their needs. Whether running lightweight applications on local devices or integrating AI into areas with limited internet access, SLMs offer a new frontier in AI utility.

Microsoft's approach also emphasizes the importance of high-quality, tailored training data. By leveraging a method akin to creating tiny children's stories, researchers developed the "TinyStories" dataset, enabling these models to generate content with remarkable fluency and grammar. (Here you can get the paper) The subsequent project, "CodeTextbook," utilized high-quality educational content to refine these models further, demonstrating a strategic focus on content quality over quantity.

Microsoft's innovative use of small language models enhances computational efficiency as AI evolves. It promises to expand its potential applications, making technology accessible in even the most remote locations.

This blend of innovation, accessibility, and practical application underscores a significant shift in AI development—bringing powerful tools to a broader audience without extensive infrastructure.

My small test with the model

I ran some specific tests related to Aspec-Based Sentiment Analysis using the smallest of the models (Phi3-mini 128K), and the results are pretty impressive. Let's review them together.

I used a simple sentence I incorporated into a prompt:

"Run an aspect based sentiment analisys of the sentence: The phone is good but the battery doesn't last long. Customer service was not good."        

Then, I explained the model and his role in the Natural Language Processing algorithm trained to execute Aspect Sentiment Analysis:

{"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"}        

The output is quite good:

{
  "aspects": [
    {
      "aspect": "phone",
      "sentiment": "positive"
    },
    {
      "aspect": "battery",
      "sentiment": "negative"
    },
    {
      "aspect": "customer service",
      "sentiment": "negative"
    }
  ]
}        

Now my big question was: does that small model work multi-lingual?

# ABSA GERMAN
messages = [
    {"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
    {"role": "user", "content": "Run an aspect based sentiment analisys of the sentence: Das Telefon ist gut, aber der Akku h?lt nicht lange durch. Ich habe den Kundendienst angerufen und es ist eine Katastrophe!"},
]        

With a big surprise, it is working quite well:

{
  "aspect": "Telefon",
  "sentiment": "positive",
  "aspect": "Akkulaufzeit",
  "sentiment": "negative",
  "aspect": "Kundendienst",
  "sentiment": "negative"
}        

What do you think about Italian?

{
  "aspects": {
    "telefono": {
      "sentiment": "positive",
      "evidence": "Il telefono è buono"
    },
    "batteria": {
      "sentiment": "negative",
      "evidence": "ma la batteria non dura molto"
    },
    "servizio clienti": {
      "sentiment": "negative",
      "evidence": "Ho chiamato il servizio clienti ed è un disastro!"
    }
  }
}        

Wow! In Italian, I'm also getting the segment! (the part of the sentence related to the topic and the sentiment)

My colleague then asked me to try in Russian to check if the Cyrillic alphabet plays a role:

# ABSA RUSSIAN
messages = [
    {"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
    {"role": "user", "content": "Run an aspect based sentiment analisys of the sentence: Телефон хороший, но батарея держится недолго. Я позвонил в службу поддержки, и это катастрофа!"},
]        

Again, a solid result:

{
  "aspects": {
    "telephone": {
      "sentiment": "positive",
      "evidence": "Телефон хороший"
    },
    "battery": {
      "sentiment": "negative",
      "evidence": "батарея держится недолго"
    },
    "customer service": {
      "sentiment": "negative",
      "evidence": "это катастрофа!"
    }
  }
}        

What do you think about stressing the model and passing the sentence in Arabic?

# ABSA ARABIC
messages = [
    {"role": "system", "content": "You are a Natural Language Processing algorithm trained to execute Aspect Based Sentiment Analysis. You answer is always a JSON, no extra text only JSON"},
    {"role": "user", "content": "?????? ???? ??? ???????? ?? ???? ??????. ????? ????? ??????? ??? ?????"},
]        

The result is less precise than the previous languages but still able to identify two topics and associate the sentiment:

 {   "aspect": "????",   "sentiment": "positive",   "aspect": "??????",   "sentiment": "negative" }        

It looks at our specific ABSA case, and the Small Language Model works well and provides valid insights.


CONCLUSIONS

After reviewing Microsoft's development of small language models (SLMs), several conclusions can be drawn about their potential future and applicability in fields like Voice of the Customer (VoC) analysis:

  1. Feasibility of Small Language Models: SLMs have a promising future due to their smaller size, which makes them less resource-intensive and more accessible for a variety of applications. They are particularly suitable for deployment in environments with limited computational resources or where data privacy concerns necessitate local processing.
  2. Efficacy in Handling Specific Tasks: Despite their smaller size, these models have demonstrated an ability to perform remarkably well on specific tasks, matching or surpassing larger models in particular benchmarks. This indicates that SLMs can be highly influential within their capability range.
  3. Innovation in Training Approaches: Microsoft's innovative training methodologies, such as using high-quality, simplified datasets modeled after children's books, enable these smaller models to achieve high performance. This approach highlights a shift towards quality over quantity in data usage, which can be particularly beneficial for specialized applications like VoC, where precision and context understanding are critical.
  4. Enhanced Data Privacy: Businesses can significantly mitigate data privacy concerns by implementing SLMs locally on devices or within specific network environments. Local processing aligns with regulations such as GDPR by keeping sensitive customer data within the device or regional network, thus avoiding the risks associated with transmitting data to remote servers.
  5. Applicability to Voice of the Customer Analysis: SLMs can provide a solid solution for analyzing VoC data. These models can be fine-tuned to effectively understand customer feedback, sentiment, and preferences, especially when trained on high-quality, domain-specific datasets. Their ability to operate locally (on-device) also allows for real-time processing and insights generation, which is invaluable for immediate customer service responses and enhancing user experience.
  6. Scalability and Integration: SLMs can scale solutions according to specific business needs. Companies can start with small models for fundamental analysis and escalate to larger models as needed or use an ensemble to handle different aspects of VoC analysis, from initial data parsing to deep insight generation.
  7. Future Developments and Improvements: As AI research continues, SLMs' capabilities are likely to expand, enabling them to handle more complex tasks and provide more profound insights. Developing new models and training datasets will enhance their effectiveness and applicability across various sectors.

?Small language models have a bright future, especially in specialized applications like Voice of the Customer analysis. Their ability to provide actionable insights while being cost-effective and resource-efficient makes them a valuable tool in the evolving landscape of AI technology.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了