On RAGs and Riches
Venkat Ramakrishnan
Chief Quality Officer | Software Testing Technologist | Keynote Speaker | Corporate Storyteller
Back in 2018, when I did a talk at ThoughtWorks on NLP, there was an euphoria on the state of chatbots. There was even hype, with every college graduate I met saying that they work on chatbots, just like they did with Java in the late 1990s. In that talk, I recommended NOT to go for chatbots development as the technology had not matured, and even with BERT et al, there were basic errors. Those, compounded with highly vernacular provinces in India, where people when allowed to type, mixed Hindi with English, making it hard for the bots to process text. I recommended menu-driven selections (like in business rules) rather than AI-driven chatbots.
Now in 2024, we are looking at LLMs which can fluently generate text. Their ability to chat has significantly increased, with Transformer decoders like GPT. Although they suffer from issues like hallucinations, they provide a mostly satisfying experience as a chat bot, a companion to discuss things, and democratizing general knowledge. C'mon, you got to give to them; you can just launch Google Gemini and chat any topic of your choice!
But testers are a tough creed (as Jason Arbon puts it). We do a 'gradient-descent' towards truth and accuracy (beware, it could turn out looking like fault-finding!), and in that process, we find that many use GenAI tools like GPT for purposes that they are not meant for. My friend Adam Shostock, a pioneer in threat modeling, as a way of fun, counted the number of 'b's in the word 'blackberry' with chatGPT, and look what he got!
Here comes RAGs, as a solution to the inaccuracies and hallucinations, as suggested by Patrick Debois, a pioneer in DevOps, in a recent podcast with me for my Software Testing and Quality Talks YouTube channel (to be released soon). RAGs augment and makes the chatbots' information performance better, he says. It makes sense to me, as I am of the opinion that GPTs' usage should be restricted only to text, image generation, and probably text summarization and machine translation, and should not be used for other problems. Even if you do, you need RAGs and other data augmentation techniques.
The issue with RAGs and data augmentation techniques is that they are performance-intensive. Many times, an LLM need to make a call on whether to use its 'intrinsic' knowledge or to use RAG, and the process takes performance cycles and energy. In a world that's keen about sustainability and hard on energy-intensive computation, that's a no-no.
Even if LLMs do not use RAGs, they might need to call other apps which can do the computation for them, wherein the performance and software dependencies from a security perspective need to be looked at.
领英推荐
That leaves us with the question whether the LLMs are worth the trouble. That, is a very complex question, and would put the stuff back in a consultant's purview, and they would say (like me), 'that depends!'.
Choose your armors, battles, and wars. As they say,
“If the only tool you have is a hammer, you tend to see every problem as a nail.”
If you would like to explore a comprehensive quality strategy for your LLM implementation, please feel free to get in touch with me.