登录查看更多内容

The Week of Small Language Models

AIM

Explain AI, And Its Commercial, Social And Political Impact. For Brand collaborations, write to [email protected]

发布日期: 2024年7月22日

This week marks an exciting trend in AI with the rise of small language models from Hugging Face, Mistral AI, NVIDIA, and OpenAI, among others, showcasing the latest advancements in the field. “LLM model size competition is intensifying… backwards!” quipped OpenAI cofounder Andrej Karpathy. (see below)

The week started off with Paris-based AI startup Mistral AI, in collaboration with NVIDIA, introducing Mistral NeMo, a 12-billion parameter model with a 128k token context length, alongside the launch of MathΣtral, a specialised 7B model for advanced mathematical reasoning and scientific exploration.

A few days later, Hugging Face also released a new series of compact language models called SmolLM, available in three sizes: 130M, 350M, and 1.7B parameters. These models are perfect for use on local devices such as laptops and phones, eliminating the need for cloud-based resources and significantly reducing energy consumption.

This was followed by OpenAI releasing GPT-4o mini, a highly cost-efficient model designed to expand AI applications by making intelligence more affordable. Priced at $0.15 per million input tokens and $0.6 per million output tokens, GPT-4o mini is 30x cheaper than GPT-40 and 60% cheaper than GPT-3.5 Turbo.

Similarly, H2O.ai introduced H2O-Danube3, a new series of SLMs to bring AI capabilities to mobile devices. The series includes two models: H2O-Danube3-4B, trained on 6 trillion tokens, and H2O-Danube3-500M, trained on 4 trillion tokens.

Apple, too, entered the game, releasing a model called DCLM-BASELINE 7B, along with its weights, training code, and dataset. Trained on 2.5 trillion tokens from open datasets, the model primarily uses English data and features a 2048-token context window.

In China, Alibaba released Qwen’s Base and Instruct models in five sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B, trained on data in 27 additional languages besides English and Chinese.

A few months ago, researchers from IIT Gandhinagar, India, released Ganga-1B, a pre-trained LLM for Hindi as part of its Unity project. Built from scratch using the largest curated Hindi dataset, this new model outperforms all open-source LLMs supporting Hindi, up to 7B in size.

Earlier this year, Microsoft unveiled Phi-3-Mini, a 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. Despite its compact size, Phi-3-Mini boasts performance levels that rival larger models such as Mixtral 8x7B and GPT-3.5.

LLMs Get Cheaper

Karpathy recently said that the cost of building an LLM has come down drastically over the past five years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g., the FineWeb-Edu dataset).

He said that it is now possible to train models like GPT-2 for approximately $672 on “one 8XH100 GPU node in 24 hours”.

Mayank Singh, the creator of Ganga-1B, told AIM that it was built from scratch for under INR 10 lakh. Meanwhile, Tech Mahindra also told us that it was able to build Project Indus for well within $5 million (INR 41 crore), which again, is built on GPT-2 architecture, starting from the tokeniser to the decoder.

Are SLMs the future of generative AI?

Abacus.AI chief Bindu Reddy predicted that in the next five years, smaller models will become more efficient, LLMs will continue to become cheaper to train, and LLM inference will become widespread. “We should expect to see several Sonnet 3.5 class models that are 100x smaller and cheaper in the next one to two years.”

Given the cost benefits, soon, there will be hundreds and thousands of SLMs mushrooming in the space. But the question is, how are enterprises going to benefit from them? And, are SLMs really the future of generative AI?

“Technology is rapidly evolving how we operate and train. Three months ago, using an SLM isolated in a customer's private database might have seemed like a drawback compared to a private instance of a large public model,” said Kasey Roh from Upstage, which has built an SLM called Solar, one of the top-ranked models on the Hugging Face Open LLM leaderboard, and a fine-tune of Llama 2.

Further, Roh said that to address the challenges of SLMs, the company has now entered into a subscription model with its customers, allowing them to continuously train and fine-tune models with the latest data points.

She believes that this modular and standardised approach significantly mitigates the drawbacks of using small private models. “Once we have a complete set of the tools and programmes available, I think that the drawback of having a small model that's private can be largely addressable,” said Roh.

Many experts believe that SLMs or specialised models are going to be the future, alongside generalised large models like GPT-4 or Claude 3.5 Sonnet. “For everyday use, an 8B or maybe 70B LLM will suffice. If you don't want to test a model to the max, you don't need a SOTA model. For everyday questions, which now appear in all training data, smaller models are sufficient,” posted a user on X who goes by the name Kimmonismus, aka Chubby.

“Both specialised and generic models will coexist, one is not a replacement for the other. It’s the wrong dream to believe that we only need one API, such as OpenAI. We will need both mega models aiming for AGI and specialised micro models that can integrate into today’s workflows,” said Pramod Varma, chief architect of Aadhaar, in a recent interaction with AIM.

Check out the full story here.

领英推荐

The Week of Small Language Models

Bhasker Gupta 4 个月前

Unlocking LLM Potential with Memory Compression: ARM…

Ganesh Raju 3 个月前

Microsoft’s New Phi-3 Small Language Models Pack A…

ARK Investment Management LLC 7 个月前

Read the full story here.

People & Tech >>

In the Era of Gen AI MongoDB, PostgreSQL Will Run out of Gas, DataStax Won’t

Jihad Dannawi of DataStax, in an exclusive interview with AIM, highlighted the need for scalable databases like Apache Cassandra to support the growing complexity and real-time demands of generative AI applications.

He said that in the era of generative AI, “MongoDB will run out of gas; PostgresDB will run out of gas when it has to handle real-time processing or high volumes of data. To effectively manage large-scale operations, you require a robust infrastructure that can handle substantial demands without running out of capacity.”

Check out the full story here.

BharatGPT’s Ganesh Ramakrishnan’s AI Startup bbsAI Tackles Limited Indic Data Challenge

Apart from working on BharatGPT, Ganesh Ramakrishnan, a professor at IIT Bombay, has been dedicated to developing translation engines. To continue this bid, he has co-founded bbsAI with Ganesh Arnaal, which has been a decade in the making. Get to know more about bbsAI here.

AIM Videos >>

The Secret Behind Aadhaar Card Photos

In our latest episode of Tech Talks, AIM speaks to Pramod Varma, the CTO of EkStep Foundation and chief architect of Aadhaar & India Stack, about the secret behind our ‘unfortunate’ photos on Aadhaar cards, besides discussing some of the challenges and struggles in executing Aadhaar project nationwide.

Workshop Wonders

RAG & Fine Tuning in GenAI with Snowflake

Join Prashant Wate from Snowflake India for an exciting workshop on RAG & Fine Tuning in GenAI. Learn how to optimise models and create seamless AI apps effortlessly.

Date: July 25, 2024

Time: 6 - 7.30 pm

AIM Shots >>

A faulty software update from cybersecurity firm CrowdStrike caused one of the largest tech outages ever, resulting in the widespread “blue screen of death” on Microsoft Windows hosts, disrupting flights, financial institutions, and news broadcasts worldwide, with industries expected to feel the aftermath for weeks.
NVIDIA recently invested $30 million in San Jose and Bengaluru-based Arrcus to enhance its data traffic management platform, which serves major clients like SoftBank and Target, alongside scaling its global operations and technology infrastructure.
Salesforce recently unveiled its fully autonomous AI agent, Einstein Service Agent, poised to revolutionise customer service by augmenting human agents, making traditional chatbots obsolete.
Infosys, in its FY25 Q1 results, emphasised its dedication to generative AI through its Topaz platform, highlighting strong client traction but refrained from disclosing specific revenue figures, while TCS reported doubling its AI pipeline to $1.5 billion.
Bengaluru-based AI-powered personalisation platform Fibr recently raised $1.8 million in a funding round led by Accel, with participation from 2am VC and angel investors, including Cred founder Kunal Shah, aiming to enhance its platform, expand its customer base, and hire new talent.
Pintel.ai, a Bengaluru-based platform designed for sales development, raised $1 million in a seed funding round led by IvyCap Ventures, with participation from notable investors, to enhance its AI-driven solution for automating prospect research and improving sales engagement.

The Belamy

60,289 位关注者

Swapnil Shah

3 个月

Article is amazing ... key questions which has grey answers for now, and would love if anyone can throw light on these a) What is the size of these models ? b) Are phones part of TAM of these models ? c) If a and b are reasonable, how do you still measure the latency on tokenization ? d) Use cases beyond playing with text, code and image ? e) Native GPU support on SLM will continue to be a challenge in absence of quantum computing on LLM, how can SLMs be executed without this ? un-app

Ashish Singh

4 个月

Excellent article on small language model.

2 次回应

查看更多评论

要查看或添加评论，请登录

The Week of Small Language Models

AIM

Explain AI, And Its Commercial, Social And Political Impact. For Brand collaborations, write to [email protected]

领英推荐

The Belamy

60,289 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

LLM Pulse- September 2, 2024

Large Language Models - The Hardware Connection

?? Is Google Back in the AI Race?

ChatGPT: Beyond The Curious Beast

Artificial Intelligence #27

Acceleration in Innovation! The Latest Breakthroughs in Conversational AI, Computer Vision and Recommender Systems with NVIDIA

ChatGPT and CFD

AI News Bytes: Tired of trying to get RL to work with Human Feedback? Try this new method - SLiC; LLMs Outperform Reinforcement Learning- Meet SPRING

Practical AI: From Theory to Added Value (Part 2)

Will Artificial Intelligence save the day?

领英推荐

The Belamy

60,289 位关注者

There is No Wall for China

2024年11月25日

Make AI Great Again!

2024年11月18日

No More Back Office as GCCs Take the Front Seat

2024年11月11日

Big Tech’s Appetite for AI Has No End

2024年11月4日

The Most Influential AI Week Ever for India

2024年10月28日

Indian IT (GenAI) Quarterly Report

2024年10月21日

All You Need is Love (and Optimus!)

2024年10月14日

Cheap Code ≠ Cheaper Coders

2024年10月7日

Nothing like Cypher ??

2024年9月30日

The Era of ‘Enterprise’ AI Agents Begins

2024年9月23日

社区洞察

其他会员也浏览了

LLM Pulse- September 2, 2024

Large Language Models - The Hardware Connection

?? Is Google Back in the AI Race?

ChatGPT: Beyond The Curious Beast

Artificial Intelligence #27

Acceleration in Innovation! The Latest Breakthroughs in Conversational AI, Computer Vision and Recommender Systems with NVIDIA

ChatGPT and CFD

AI News Bytes: Tired of trying to get RL to work with Human Feedback? Try this new method - SLiC; LLMs Outperform Reinforcement Learning- Meet SPRING

Practical AI: From Theory to Added Value (Part 2)

Will Artificial Intelligence save the day?