Is RAG Dead? ??
AIM Research
Strategic insights for Artificial Intelligence Industry. For Brand collaborations, write to [email protected]
No, RAG isn’t dead yet, and many experts believe that it’s likely to change for the better, irrespective of the long context window of LLMs.?
Many developers have been experimenting with RAG, for instance, building a RAG app with Llama-3 running locally, alongside enterprises that are coming up with developments like Rovo, a new AI-powered knowledge discovery tool unveiled by Atlassian.
RAG killers?
A few months ago, many argued that even after 1 million long context windows, RAG will still be necessary in most cases because of the cost factor.?
However, in a recent interview with The New York Times, Google DeepMind chief Demis Hassabis said the company was working on caching reference materials to make subsequent processing much cheaper.?
“We're working hard on optimisations... Once you've uploaded [the data] and it's processed, the subsequent questions and answering of those questions should be faster. We're confident we can get that down to the order of a few seconds,” said Hassabis.?
Along similar lines, OpenAI released a memory feature, which serves as a relatively small memory that can retain a handful of facts about a user, thereby improving an AI’s understanding and functionality, and eventually reducing cost.?
For instance, Meta said it spent nearly $30 billion on a million NVIDIA GPUs to train its AI models. This excludes training and inference, as acknowledged by Yann LeCun, who remarked: “It’s staggering. Isn’t it?”?
It costs less to RAG?
As per reports, the LLM training cost for the top AI models continues to surge. For instance, OpenAI’s GPT-4 is estimated to cost $78 million, while Google’s Gemini is estimated to have cost $191 million.?
So if you choose to ditch RAG and instead stuff all your documents into the LLM’s context, the LLM will need to handle one million tokens for each query.?
For example, if you use Gemini 1.5 Pro, which costs approx $7 per million tokens, you will essentially be paying this amount every time the full million tokens are utilised in a query.
The price difference is stark as the cost per call with RAG is a fraction of the $7 required for a single query in Gemini 1.5, especially for applications with frequent queries.?
Speaking of RAG, Subtl.ai, an Indian AI startup, has developed a ‘private Perplexity’ platform using light models tailored for enterprises, which operates on top of existing cloud infrastructure without internet connectivity to secure sensitive data.
The company told AIM that it started out using OpenAI solutions, moved to Mistral, and now uses Llama 3. It has five models under the hood for a seamless experience for its customers, with the second-biggest model being only 110 million parameters, making it lightweight and easy for customers to integrate.
In the coming weeks, Subtl.ai plans to release the model for enterprise for free for a month and also give a private product on the internet for people to test out.
领英推荐
?? Exciting News! ???
Join us at WiDS Bangalore 2024, where data science innovation takes centre stage! ??Calling all enthusiasts to submit groundbreaking papers on AI applications in fintech, healthcare, business analytics, and more.?
Don't miss this chance to showcase your expertise to a global audience. Submit now and be part of the conversation! #WiDS2024?
Learn more and register: [Link] ??
AI More Likely to Replace Your Toxic Manager than Workers
While everyone has been breaking a sweat over AI taking away their job, the technology has apparently zeroed-in on an unlikely target – the commanders. Surprisingly, the threat may have shifted to middle and upper management, instead of the foot soldiers of an organisation. Read more here.?
[Exclusive] Sarvam AI to release Indic voice LLMs soon
In our latest episode of Tech Talks, Vivek Raghavan, cofounder of Sarvam AI, discusses future plans of the company, its work on Large Language Models (LLMs), and the potential of the Indian AI ecosystem.
INDIA
AI Conclave Wonders
Rakuten India, in partnership with AIM, is hosting the fourth edition of the Rakuten Product Conference (RPC) ‘24, themed ‘Innovation Reimagined: Enterprise SaaS & AI’, as a virtual event on May 21-22, focusing on Enterprise SaaS and AI for data scientists and innovators globally. Click here to join. >>