The Week of Small Language Models

The Week of Small Language Models

This week marks an exciting trend in AI with the rise of small language models from Hugging Face, Mistral AI, NVIDIA, and OpenAI, among others, showcasing the latest advancements in the field. “LLM model size competition is intensifying… backwards!” quipped OpenAI cofounder Andrej Karpathy. (see below)?

The week started off with Paris-based AI startup Mistral AI, in collaboration with NVIDIA, introducing Mistral NeMo, a 12-billion parameter model with a 128k token context length, alongside the launch of MathΣtral, a specialised 7B model for advanced mathematical reasoning and scientific exploration.

A few days later, Hugging Face also released a new series of compact language models called SmolLM, available in three sizes: 130M, 350M, and 1.7B parameters. These models are perfect for use on local devices such as laptops and phones, eliminating the need for cloud-based resources and significantly reducing energy consumption.

This was followed by OpenAI releasing GPT-4o mini, a highly cost-efficient model designed to expand AI applications by making intelligence more affordable. Priced at $0.15 per million input tokens and $0.6 per million output tokens, GPT-4o mini is 30x cheaper than GPT-40 and 60% cheaper than GPT-3.5 Turbo.

Similarly, H2O.ai? introduced H2O-Danube3, a new series of SLMs to bring AI capabilities to mobile devices. The series includes two models: H2O-Danube3-4B, trained on 6 trillion tokens, and H2O-Danube3-500M, trained on 4 trillion tokens.

Apple, too, entered the game, releasing a model called DCLM-BASELINE 7B, along with its weights, training code, and dataset. Trained on 2.5 trillion tokens from open datasets, the model primarily uses English data and features a 2048-token context window.

In China, Alibaba released Qwen’s Base and Instruct models in five sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B, trained on data in 27 additional languages besides English and Chinese.

A few months ago, researchers from IIT Gandhinagar, India, released Ganga-1B, a pre-trained LLM for Hindi as part of its Unity project. Built from scratch using the largest curated Hindi dataset, this new model outperforms all open-source LLMs supporting Hindi, up to 7B in size.?

Earlier this year, Microsoft unveiled Phi-3-Mini, a 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. Despite its compact size, Phi-3-Mini boasts performance levels that rival larger models such as Mixtral 8x7B and GPT-3.5.

LLMs Get Cheaper?

Karpathy recently said that the cost of building an LLM has come down drastically over the past five years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g., the FineWeb-Edu dataset).

He said that it is now possible to train models like GPT-2 for approximately $672 on “one 8XH100 GPU node in 24 hours”.?

Mayank Singh, the creator of Ganga-1B, told AIM that it was built from scratch for under INR 10 lakh. Meanwhile, Tech Mahindra also told us that it was able to build Project Indus for well within $5 million (INR 41 crore), which again, is built on GPT-2 architecture, starting from the tokeniser to the decoder.?

Are SLMs the future of generative AI??

Abacus.AI chief Bindu Reddy predicted that in the next five years, smaller models will become more efficient, LLMs will continue to become cheaper to train, and LLM inference will become widespread. “We should expect to see several Sonnet 3.5 class models that are 100x smaller and cheaper in the next one to two years.”

Given the cost benefits, soon, there will be hundreds and thousands of SLMs mushrooming in the space. But the question is, how are enterprises going to benefit from them? And, are SLMs really the future of generative AI??

“Technology is rapidly evolving how we operate and train. Three months ago, using an SLM isolated in a customer's private database might have seemed like a drawback compared to a private instance of a large public model,” said Kasey Roh from Upstage, which has built an SLM called Solar, one of the top-ranked models on the Hugging Face Open LLM leaderboard, and a fine-tune of Llama 2.?

Further, Roh said that to address the challenges of SLMs, the company has now entered into a subscription model with its customers, allowing them to continuously train and fine-tune models with the latest data points.

She believes that this modular and standardised approach significantly mitigates the drawbacks of using small private models. “Once we have a complete set of the tools and programmes available, I think that the drawback of having a small model that's private can be largely addressable,” said Roh.

Many experts believe that SLMs or specialised models are going to be the future, alongside generalised large models like GPT-4 or Claude 3.5 Sonnet. “For everyday use, an 8B or maybe 70B LLM will suffice. If you don't want to test a model to the max, you don't need a SOTA model. For everyday questions, which now appear in all training data, smaller models are sufficient,” posted a user on X who goes by the name Kimmonismus, aka Chubby.

“Both specialised and generic models will coexist, one is not a replacement for the other. It’s the wrong dream to believe that we only need one API, such as OpenAI. We will need both mega models aiming for AGI and specialised micro models that can integrate into today’s workflows,” said Pramod Varma, chief architect of Aadhaar, in a recent interaction with AIM.

Check out the full story here.


Top Stories of the Week >>?

Google Trying to Mimic Google Pay for Transportation via Namma Yatri

Back in 2017, Google made UPI and cashless transactions household terms with the launch of? Google Pay in India. The app has over 67 million users in the country, its largest single market to date. Now, the company is trying to mimic the success of Google Pay to democratise transformation for all via Namma Yatri, the Bengaluru-based open-source rival of ride-hailing services like Ola and Uber.?

Read the full story here.?


People & Tech >>?

In the Era of Gen AI MongoDB, PostgreSQL Will Run out of Gas, DataStax Won’t

Jihad Dannawi of DataStax, in an exclusive interview with AIM, highlighted the need for scalable databases like Apache Cassandra to support the growing complexity and real-time demands of generative AI applications.

He said that in the era of generative AI, “MongoDB will run out of gas; PostgresDB will run out of gas when it has to handle real-time processing or high volumes of data. To effectively manage large-scale operations, you require a robust infrastructure that can handle substantial demands without running out of capacity.”?

Check out the full story here.

BharatGPT’s Ganesh Ramakrishnan’s AI Startup bbsAI Tackles Limited Indic Data Challenge

Apart from working on BharatGPT, Ganesh Ramakrishnan, a professor at IIT Bombay, has been dedicated to developing translation engines. To continue this bid, he has co-founded bbsAI with Ganesh Arnaal, which has been a decade in the making. Get to know more about bbsAI here.?


AIM Videos >>?

The Secret Behind Aadhaar Card Photos?

In our latest episode of Tech Talks, AIM speaks to Pramod Varma, the CTO of EkStep Foundation and chief architect of Aadhaar & India Stack, about the secret behind our ‘unfortunate’ photos on Aadhaar cards, besides discussing some of the challenges and struggles in executing Aadhaar project nationwide.?


Workshop Wonders?

RAG & Fine Tuning in GenAI with Snowflake

Join Prashant Wate from Snowflake India for an exciting workshop on RAG & Fine Tuning in GenAI. Learn how to optimise models and create seamless AI apps effortlessly.

Date: July 25, 2024?

Time: 6 - 7.30 pm

Register now!?


AIM Shots >>?

  • A faulty software update from cybersecurity firm CrowdStrike caused one of the largest tech outages ever, resulting in the widespread “blue screen of death” on Microsoft Windows hosts, disrupting flights, financial institutions, and news broadcasts worldwide, with industries expected to feel the aftermath for weeks.
  • NVIDIA recently invested $30 million in San Jose and Bengaluru-based Arrcus to enhance its data traffic management platform, which serves major clients like SoftBank and Target, alongside scaling its global operations and technology infrastructure.
  • Salesforce recently unveiled its fully autonomous AI agent, Einstein Service Agent, poised to revolutionise customer service by augmenting human agents, making traditional chatbots obsolete.
  • Infosys, in its FY25 Q1 results, emphasised its dedication to generative AI through its Topaz platform, highlighting strong client traction but refrained from disclosing specific revenue figures, while TCS reported doubling its AI pipeline to $1.5 billion.
  • Bengaluru-based AI-powered personalisation platform Fibr recently raised $1.8 million in a funding round led by Accel, with participation from 2am VC and angel investors, including Cred founder Kunal Shah, aiming to enhance its platform, expand its customer base, and hire new talent.
  • Pintel.ai, a Bengaluru-based platform designed for sales development, raised $1 million in a seed funding round led by IvyCap Ventures, with participation from notable investors, to enhance its AI-driven solution for automating prospect research and improving sales engagement.

Truly the week, month and maybe year of small language models! Thanks for the feature Bhasker Gupta

回复
Vanshika Katal

Marketing I Social Media I AI

8 个月

Interesting trend with the rise of small language models! As a user of GoodGist, I've seen how different scales of language models can impact AI applications. If you're curious about how AI can enhance processes and provide tailored solutions, this resource might be insightful:?https://bit.ly/45QkTEz

Rayane Boumoussou

CEO & Founder @Yarsed | $30M+ in clients revenue | Ecom - UI/UX - CRO - Branding

8 个月

Intriguing trends in language models - size inverted Bhasker Gupta

要查看或添加评论,请登录

Bhasker Gupta的更多文章

  • The End of the Coding Era

    The End of the Coding Era

    AI might take over coding sooner than anyone expects. And we’re not talking a cursory (pun intended!) 20-30%, but a…

    3 条评论
  • Can IndiaAI Deliver LLM in 6 Months? ??

    Can IndiaAI Deliver LLM in 6 Months? ??

    Is India’s wait for its own LLM finally coming to an end? Or are we on a 6-month reset? Last week, MeitY launched AI…

    2 条评论
  • Bengaluru is the AI Capital

    Bengaluru is the AI Capital

    A quiet yet decisive shift is happening in Namma Bengaluru. The ‘IT City’ is no longer just a hub for outsourcing or IT…

    3 条评论
  • The Week That Felt Like a Decade in Tech

    The Week That Felt Like a Decade in Tech

    Last week felt like a decade compressed into days with AI, quantum computing, and robotics all taking giant leaps…

    2 条评论
  • Karnataka Bags $115 Billion, What’s Next?

    Karnataka Bags $115 Billion, What’s Next?

    “Reimagine Growth” was the theme of this year’s Invest Karnataka, and the state has delivered on its promise in a big…

    1 条评论
  • What Unfolded at MLDS-2025

    What Unfolded at MLDS-2025

    “The human brain is an extremely efficient AGI. It runs on potatoes.

  • India Bets Big on AI in Budget 2025-26

    India Bets Big on AI in Budget 2025-26

    India had high hopes for the Union Budget 2025-26, especially after the Chinese DeepSeek model took the world by storm,…

  • India’s AI Mission: The Talk of Davos 2025

    India’s AI Mission: The Talk of Davos 2025

    As we celebrate the 76th Republic Day, AIM reflects on India’s incredible progress and the boundless opportunities…

    1 条评论
  • Indian IT ‘Agentic AI’ Mode: ON

    Indian IT ‘Agentic AI’ Mode: ON

    A few days ago, in a conference call with journalists and analysts, the CEO of a top Indian IT company was asked a…

    3 条评论
  • The Agentic AI Madness Begins

    The Agentic AI Madness Begins

    A few days ago, the AI world quietly reached a significant milestone. Dharmesh Shah, founder and CTO of HubSpot…

    5 条评论

社区洞察

其他会员也浏览了