Build for AI Models of tomorrow, today
Turja N Chaudhuri ( ?? to the Cloud )
Global Lead, Platform Success, EY Fabric | ? Practice at EY | Views are my own
Section I : GenAI is the new cool tech on the block, and everyone wants a piece
Nowadays, at work, I spend a big time of my day talking to consumers on the AI / GenAI capabilities of the internal platform that I represent. My main focus is to enable our consumers to build client-facing Gen-AI enabled applications by leveraging the AI building blocks or capabilities that are available for reuse, and integration, as part of the internal platform that my enterprise has.
I talk to tens of customers every day, and the direction is clear. GenAI has arrived. Everyone is rushing to build the next GenAI infused application for their clients. It’s difficult to get funding even, if your application does not have GenAI capabilities embedded into it, be it a QnA chatbot, a knowledge engine, report summarization, natural language to SQL translations, etc.
I asked Bing to point me to surveys that confirm the rapid adoption of GenAI among enterprises, and the results prove the hypothesis, to an extent :
Section II : Not all’s well in GenAI la-la land
But, at the same time, the skepticism on whether these tools/technologies can bring credible, enduring value to business is at an all-time high.
Business leaders are increasingly concerned that there are chances that even though the tech is good, and promising, it does not have the robustness, or credibility that is needed for enterprise use-cases. For example, a few errors created by ChatGPT while responding to a question on ‘Why Trump is better than Biden ?” is acceptable, but if you build a GenAI solution to buy stocks based on market analysis, and it hallucinates, you are potentially exposing your company to millions of USD in fines, losses, or legal liabilities.
The well publicized news about a lawyer getting penalized for using ChatGPT in court cases is an example of this very real issue when using LLM(s) that have the potential to hallucinate; Ref : https://www.indiatoday.in/technology/news/story/lawyer-uses-chatgpt-to-meet-deadlines-loses-job-after-ai-tool-creates-fake-cases-2465106-2023-11-20
There also seems to be a lot of frustration in how long it takes to get a GenAI solution ready for enterprise-grade rollout. It’s a time-consuming effort getting all the parts ready, rigorous testing to ground the answers, avoiding hallucination, ensure the results are correct, benchmark the retrieval / LLM responses, and so on. And by the time you are finished, a new LLM feature comes out, and the cycle continues to integrate this new feature into the application, and so on.
Most enterprises have put in rigorous quality checks to ensure that the final product is robust, and dependable. This can be simple checks as monitoring the models for performance, and reliability via metrics, MLOps. Or it could be testing the GenAI application via Responsible AI frameworks that test for bias, harm, etc., and could potentially take weeks to months, delaying releases.
The fatigue and frustration are real, I can sense it when I talk to consumers, and they are looking for robust solutions, that will stand the test of time, can be deployed quickly and authoritatively, with less risk, and credible, sustainable value.
领英推荐
Section III : Skate to where the puck is going to be, not where it has been
Based on all the client-facing discussions on GenAI topics in the last few months, I have come to my own conclusions as to why teams are not getting as much value out of this technology / tooling as they expected in the start.
My conclusion – Clients are too fixated about what these models can do today, and not thinking about what they will be able to do in the future, say 6 to 12 to 18 months down the line.
Things that the model is not able to do ( very well ) now, for example, deterministically answer questions, etc, will get much better as time goes on. Already the GPT-4 models are much better than GPT-3 / 3.5 for example. The context windows have significantly improved, the latest Claude model has 200K tokens as the context window, and so on. Over time, these foundational models will get more improved, less costly, less latency, more reliable and so on.
Enterprises who fundamentally believe that this will happen, are in a unique position to plan for the same. So, rather than waiting for it to happen, and then take another 1 year to get your application ready on top of the next thing that OpenAI / Anthropic releases, if you build your application in a futuristic sense, that will uniquely set you up for success. Like in other fields of enterprise IT, it’s all about the long-term vision, and planning for the same.
This does not imply that the current models cannot get work done. They are quite good at some tasks, especially when you augment them with context via RAG patterns or fine-tune them in some cases. But, the ability to unlock true, credible and tangible value from these LLM(s) will only come when you are able to invest in the future prospects of these LLM(s) exponentially being better at their existing capabilities, and being able to perform new, unexplored tasks as well.
Some teams / clients I talk to get this, they are willing to take some risks, with the belief that this technology will get much better, much cheaper in the not-so-distant future, but others are skeptical. They are trying to solve their current problems, with the current models, and when it does not work out, feeling frustrated, and thinking it’s all a hype.
This will be an interesting, and developing space to keep an eye out for.
“ You should basically skate where the puck is going to go ” – Dario Amodei, Founder and CEO of Anthropic, on the Logan Bartlett Show; Ref : https://www.youtube.com/watch?v=gAaCqj6j5sQ