?? All You Need to Know About Small Language Models

?? All You Need to Know About Small Language Models

In this issue:

  1. A survey on SLMs
  2. A way towards more brain-like inference
  3. How to better count the r’s in strawberry


MLOps/GenAI World is all about solving real-world problems and sharing genuine experiences with production-grade AI systems.

Join leaders and engineers from Microsoft, Huggingface, BlackRock and many more for the following tracks:

  • Real World Case Studies
  • Business & Strategy
  • Technical & Research (levels 1-7)
  • Workshops (levels 1-7)
  • In-person coding sessions

Get Access to 30+ virtual workshops, 60+ in-person talks and 90+ hours of recordings by claiming your personal discount.

Last Chance to Save $75 USD


1. A Survey of Small Language Models

Watching: “Small” Language Models (paper)

What problem does it solve? While Large Language Models (LLMs) have been dominating the headlines, Small Language Models (SLMs) are becoming increasingly important. SLMs are designed to be efficient and performant while requiring minimal computational resources. This makes them ideal for various settings, including on-device, mobile, and edge devices. As the demand for language models in resource-constrained environments grows, the need for a comprehensive understanding of SLMs becomes crucial.

How does it solve the problem? The survey presents a novel taxonomy for categorizing the methods used to optimize SLMs. It covers various techniques, including model compression, pruning, and quantization. Model compression techniques aim to reduce the size of the model while maintaining its performance. Pruning involves removing less important weights or connections from the model, reducing its complexity. Quantization techniques focus on reducing the precision of the model's parameters, leading to smaller model sizes and faster inference times. By systematically organizing these methods, the survey provides a clear overview of the approaches used to create efficient SLMs.

What's next? Despite the advancements in SLMs, several open challenges remain to be addressed. These challenges may include further improving the efficiency-performance trade-off, developing more effective compression techniques, and ensuring the robustness and generalization capabilities of SLMs across various tasks and domains. Additionally, there is a need for standardized benchmark datasets and evaluation metrics specifically tailored for SLMs to facilitate fair comparisons and track progress in the field.


2. A prescriptive theory for brain-like inference

Watching: Brain-like inference (paper)

What problem does it solve? The Evidence Lower Bound (ELBO) is a widely used objective function for training deep generative models like Variational Autoencoders (VAEs). While ELBO maximization has been useful in interpreting generative models, including diffusion models, it is often considered too broad to provide specific guidance for designing architectures in neuroscience or machine learning. This work aims to bridge the gap between ELBO maximization and prescriptive theories for NeuroAI.

How does it solve the problem? The authors show that maximizing ELBO under Poisson assumptions for general sequence data leads to a spiking neural network called the iterative Poisson VAE (iP-VAE). This model performs Bayesian posterior inference through its membrane potential dynamics, establishing a closer connection to biological neurons compared to previous brain-inspired predictive coding models based on Gaussian assumptions. The iP-VAE learns sparser representations and demonstrates better generalization to out-of-distribution samples compared to amortized and iterative VAEs.

What's next? The findings suggest that optimizing ELBO with Poisson assumptions provides a solid foundation for developing prescriptive theories in NeuroAI. This approach could lead to more biologically plausible models that better capture the dynamics of real neurons while maintaining the benefits of deep generative models. Additionally, the insights gained from this work could inspire new architectures and training strategies in both neuroscience and machine learning.


3. Counting Ability of Large Language Models and Impact of Tokenization

Watching: Tokenization (paper)

What problem does it solve? Transformers, the architecture behind most modern Large Language Models (LLMs), have inherent limitations when it comes to reasoning capabilities. Unlike recurrent neural networks, Transformers lack recurrent connections, which limits their computational depth. This places them in the complexity class TC0, making them theoretically incapable of solving tasks that require increasingly deep reasoning as input length grows. Counting, a fundamental component of many reasoning tasks, is one such task that requires reasoning depth to grow linearly for inductive performance.

How does it solve the problem? Recent work has shown that Chain of Thought (CoT) reasoning can help alleviate some of the architectural limitations of Transformers in counting tasks. However, the role of tokenization in these models has received little attention. Unlike expert models that often use character-level tokenization, LLMs typically rely on byte-level (BPE) tokenizers, which fundamentally alters the way reasoning is processed. This study investigates the impact of tokenization on the counting abilities of LLMs and uncovers substantial performance variations based on input tokenization differences.

What's next? The findings of this study highlight the importance of considering tokenization choices when designing and evaluating LLMs for reasoning tasks. By understanding how tokenization can undermine models' theoretical computability, researchers can develop new tokenization methods that enhance reasoning capabilities in LLMs. This work opens up new avenues for improving the reasoning abilities of Transformer-based models and brings us closer to creating LLMs that can handle reasoning tasks more reliably.


Papers of the Week:


?? If you enjoyed this article, give it a like and share it with your peers.


Peter Bellen

Blog for AI Articles

4 个月

"Supercomputers and AI"?-->..... A brandnew article. Sites?: English : https://aifornoobsandexperts.com/climate-models-and-ai/ Nederlands :?https://aivoorjanenalleman.nl/klimaatmodellen-en-ai/

回复
Ryan Dsouza

Founder & Fractional Chief AI Officer building AI-First Engineering Products & Organisations | Passionate about the intersection of Art, Design & Technology | Fine Art Photographer

4 个月

Exactly Pascal, Addressing those challenges is key to unlocking SLM potential.

回复
Shahid Hussain

ML Engineer at byMind Solutions | NLP | LLMs | GenAI | Chatbots

4 个月

focus on efficiency for on device and edge applications is incredibly timely and the new taxonomy for SLM optimization techniques is a valuable addition to the field. Great to see key challenges like robustness and interpretability highlighted essential areas for future progress!

回复
Bill Staikos

LinkedIn Top Voice. I help companies drive revenue, reduce costs, and improve culture by scaling business outcomes through AI and Analytics.

4 个月

Always insightful. Thanks for sharing.

要查看或添加评论,请登录

Pascal Biese的更多文章

  • ?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

    ?? Search-R1, Gemini Embeddings & Controlled Reasoning with L1

    In this issue: Emergent search behavior in LLMs Stopping reasoning models from “overthinking” The best embeddings - for…

    1 条评论
  • ?? QwQ-32B: 20x smaller than DeepSeek-R1

    ?? QwQ-32B: 20x smaller than DeepSeek-R1

    In this issue: China just did it again: a new open source powerhouse The art of post-training reasoning models A new…

    6 条评论
  • OpenAI Can Not Be Happy About This

    OpenAI Can Not Be Happy About This

    In this issue: OpenAI releases first “vibe” model Microsoft bets on data quality and efficiency When old benchmarks…

  • ?????? One Giant Leap for AI Optimization

    ?????? One Giant Leap for AI Optimization

    In this issue: Sakana’s AI CUDA Engineer Inner Thinking Transformers Better Code Generation for any model Accelerate…

  • LLM Watch#74: DeepSeek-R1 Was Only The Beginning

    LLM Watch#74: DeepSeek-R1 Was Only The Beginning

    In this issue: 1B model > 405B model AI winning Olympic Gold Generating world models on the fly For those of you that…

    5 条评论
  • ?? Massive Progress in Reasoning Models

    ?? Massive Progress in Reasoning Models

    In this issue: Beating OpenAI with Open-Source 99% performance with only 1% data Chain-of-Associated-Thoughts (CoAT)…

    2 条评论
  • ??? Automatic Prompt Engineering 2.0

    ??? Automatic Prompt Engineering 2.0

    Foreword: hi everyone, I hope you had a great week! Before we dive into this newsletter and its (hopefully) exciting…

    5 条评论
  • ?? This AI Makes Big Tech Panic

    ?? This AI Makes Big Tech Panic

    In this issue: Re-defining what’s possible in AI DeepMind going even deeper Self-training agents are coming 1…

    11 条评论
  • ?? Google Releases Transformer 2.0

    ?? Google Releases Transformer 2.0

    In this issue: From Transformers to Titans Smaller, weaker, yet better O1-preview-level results for $450 Interested in…

    9 条评论
  • ???? AI Cutting Research Costs by 84%

    ???? AI Cutting Research Costs by 84%

    In this issue: AI helping researchers to be more efficient LLMs being unreliable when reasoning about time Evaluating…

    3 条评论

社区洞察

其他会员也浏览了