The Week of Small Language Models
AIM
Explain AI, And Its Commercial, Social And Political Impact. For Brand collaborations, write to [email protected]
This week marks an exciting trend in AI with the rise of small language models from Hugging Face, Mistral AI, NVIDIA, and OpenAI, among others, showcasing the latest advancements in the field. “LLM model size competition is intensifying… backwards!” quipped OpenAI cofounder Andrej Karpathy. (see below)
The week started off with Paris-based AI startup Mistral AI, in collaboration with NVIDIA, introducing Mistral NeMo, a 12-billion parameter model with a 128k token context length, alongside the launch of MathΣtral, a specialised 7B model for advanced mathematical reasoning and scientific exploration.
A few days later, Hugging Face also released a new series of compact language models called SmolLM, available in three sizes: 130M, 350M, and 1.7B parameters. These models are perfect for use on local devices such as laptops and phones, eliminating the need for cloud-based resources and significantly reducing energy consumption.
This was followed by OpenAI releasing GPT-4o mini, a highly cost-efficient model designed to expand AI applications by making intelligence more affordable. Priced at $0.15 per million input tokens and $0.6 per million output tokens, GPT-4o mini is 30x cheaper than GPT-40 and 60% cheaper than GPT-3.5 Turbo.
Similarly, H2O.ai introduced H2O-Danube3, a new series of SLMs to bring AI capabilities to mobile devices. The series includes two models: H2O-Danube3-4B, trained on 6 trillion tokens, and H2O-Danube3-500M, trained on 4 trillion tokens.
Apple, too, entered the game, releasing a model called DCLM-BASELINE 7B, along with its weights, training code, and dataset. Trained on 2.5 trillion tokens from open datasets, the model primarily uses English data and features a 2048-token context window.
In China, Alibaba released Qwen’s Base and Instruct models in five sizes, including Qwen2-0.5B, Qwen2-1.5B, Qwen2-7B, Qwen2-57B-A14B, and Qwen2-72B, trained on data in 27 additional languages besides English and Chinese.
A few months ago, researchers from IIT Gandhinagar, India, released Ganga-1B, a pre-trained LLM for Hindi as part of its Unity project. Built from scratch using the largest curated Hindi dataset, this new model outperforms all open-source LLMs supporting Hindi, up to 7B in size.
Earlier this year, Microsoft unveiled Phi-3-Mini, a 3.8 billion parameter language model trained on an extensive dataset of 3.3 trillion tokens. Despite its compact size, Phi-3-Mini boasts performance levels that rival larger models such as Mixtral 8x7B and GPT-3.5.
LLMs Get Cheaper
Karpathy recently said that the cost of building an LLM has come down drastically over the past five years due to improvements in compute hardware (H100 GPUs), software (CUDA, cuBLAS, cuDNN, FlashAttention) and data quality (e.g., the FineWeb-Edu dataset).
He said that it is now possible to train models like GPT-2 for approximately $672 on “one 8XH100 GPU node in 24 hours”.
Mayank Singh, the creator of Ganga-1B, told AIM that it was built from scratch for under INR 10 lakh. Meanwhile, Tech Mahindra also told us that it was able to build Project Indus for well within $5 million (INR 41 crore), which again, is built on GPT-2 architecture, starting from the tokeniser to the decoder.
Are SLMs the future of generative AI?
Abacus.AI chief Bindu Reddy predicted that in the next five years, smaller models will become more efficient, LLMs will continue to become cheaper to train, and LLM inference will become widespread. “We should expect to see several Sonnet 3.5 class models that are 100x smaller and cheaper in the next one to two years.”
Given the cost benefits, soon, there will be hundreds and thousands of SLMs mushrooming in the space. But the question is, how are enterprises going to benefit from them? And, are SLMs really the future of generative AI?
“Technology is rapidly evolving how we operate and train. Three months ago, using an SLM isolated in a customer's private database might have seemed like a drawback compared to a private instance of a large public model,” said Kasey Roh from Upstage, which has built an SLM called Solar, one of the top-ranked models on the Hugging Face Open LLM leaderboard, and a fine-tune of Llama 2.
Further, Roh said that to address the challenges of SLMs, the company has now entered into a subscription model with its customers, allowing them to continuously train and fine-tune models with the latest data points.
She believes that this modular and standardised approach significantly mitigates the drawbacks of using small private models. “Once we have a complete set of the tools and programmes available, I think that the drawback of having a small model that's private can be largely addressable,” said Roh.
Many experts believe that SLMs or specialised models are going to be the future, alongside generalised large models like GPT-4 or Claude 3.5 Sonnet. “For everyday use, an 8B or maybe 70B LLM will suffice. If you don't want to test a model to the max, you don't need a SOTA model. For everyday questions, which now appear in all training data, smaller models are sufficient,” posted a user on X who goes by the name Kimmonismus, aka Chubby.
“Both specialised and generic models will coexist, one is not a replacement for the other. It’s the wrong dream to believe that we only need one API, such as OpenAI. We will need both mega models aiming for AGI and specialised micro models that can integrate into today’s workflows,” said Pramod Varma, chief architect of Aadhaar, in a recent interaction with AIM.
Check out the full story here.
Top Stories of the Week >>
Google Trying to Mimic Google Pay for Transportation via Namma Yatri
Back in 2017, Google made UPI and cashless transactions household terms with the launch of? Google Pay in India. The app has over 67 million users in the country, its largest single market to date. Now, the company is trying to mimic the success of Google Pay to democratise transformation for all via Namma Yatri, the Bengaluru-based open-source rival of ride-hailing services like Ola and Uber.
领英推荐
Read the full story here.
People & Tech >>
In the Era of Gen AI MongoDB, PostgreSQL Will Run out of Gas, DataStax Won’t
Jihad Dannawi of DataStax, in an exclusive interview with AIM, highlighted the need for scalable databases like Apache Cassandra to support the growing complexity and real-time demands of generative AI applications.
He said that in the era of generative AI, “MongoDB will run out of gas; PostgresDB will run out of gas when it has to handle real-time processing or high volumes of data. To effectively manage large-scale operations, you require a robust infrastructure that can handle substantial demands without running out of capacity.”
Check out the full story here.
BharatGPT’s Ganesh Ramakrishnan’s AI Startup bbsAI Tackles Limited Indic Data Challenge
Apart from working on BharatGPT, Ganesh Ramakrishnan, a professor at IIT Bombay, has been dedicated to developing translation engines. To continue this bid, he has co-founded bbsAI with Ganesh Arnaal, which has been a decade in the making. Get to know more about bbsAI here.
AIM Videos >>
The Secret Behind Aadhaar Card Photos
In our latest episode of Tech Talks, AIM speaks to Pramod Varma, the CTO of EkStep Foundation and chief architect of Aadhaar & India Stack, about the secret behind our ‘unfortunate’ photos on Aadhaar cards, besides discussing some of the challenges and struggles in executing Aadhaar project nationwide.
Workshop Wonders
RAG & Fine Tuning in GenAI with Snowflake
Join Prashant Wate from Snowflake India for an exciting workshop on RAG & Fine Tuning in GenAI. Learn how to optimise models and create seamless AI apps effortlessly.
Date: July 25, 2024
Time: 6 - 7.30 pm
AIM Shots >>
Article is amazing ... key questions which has grey answers for now, and would love if anyone can throw light on these a) What is the size of these models ? b) Are phones part of TAM of these models ? c) If a and b are reasonable, how do you still measure the latency on tokenization ? d) Use cases beyond playing with text, code and image ? e) Native GPU support on SLM will continue to be a challenge in absence of quantum computing on LLM, how can SLMs be executed without this ? un-app
Visionary Senior Leader | Data Engineering | Data Analytics | Data Governance | GenAI | Speaker | Ex Yahoo, Credit Suisse, UBS
4 个月Excellent article on small language model.