Framework for evaluating Generative AI use cases
With the ChatGPT release in November 2022, Large Language Models (LLMs) / Generative AI has taken the world by storm: users either love or are irritated by it, and investors/companies across many industries are asking whether Generative AI will disrupt established modes of core information functions, from search to content generation to knowledge management. However, many of us are still trying to figure out the trillion $$$ question: what are the actual use cases where Generative AI adds the most value, and how to monetize those?? Moreover, what use cases are most practical to implement and monetize in short, medium and long term? This post is my humble attempt to suggest a simple, practical framework to understanding ChatGPT promises, limitations and most importantly, how ChatGPT applies to different use cases and industries.?
I find it helpful to evaluate potential use cases across two dimensions: fluency and accuracy. Another important aspect when evaluating use cases is how high stakes the use case is (represented by colors). Plotting different use cases across those dimensions provides an interesting decision framework I hope you will find useful:
Fluency, Accuracy, and use case stakes
A lot of excitement about ChatGPT stems from the “fluency” of ChatGPT responses, which indeed look extremely natural and human-like. This is an amazing technological achievement, stemming from a) using ground-breaking Transformer neural networks developed and open-sourced by Google in 2017 and b) training Transformer neural network on humongous (tens of billions of samples) training corpus trained on dialogues - again, huge hat tip to Google that developed and published both first dialogue (LaMDA) and first multilingual translation Large Languages models (M4 ) research. However, fluency is different from accuracy and stakes involved vary widely across use cases:
Another important view is to evaluate a particular use case at scale, i.e. will your answer change if the task is to leverage AI (assuming no or minimal human intervention) to write millions of poems, or provide millions of answers people will rely on to make important decisions?
Putting it all together
When trying to apply LLMs to different use cases, it is critical to define the requirements of a particular use case across those aspects. Iit is important to understand that getting very high fluency AND very high accuracy is very tricky, due to following limitations of Large Language generative models:
Back to my previous examples, writing a poem (or million poems) doesn’t require a high degree of accuracy, but a high degree of fluency. Moreover, it is a fairly low stakes use case (in terms of risk of getting something wrong). On the other end of the spectrum, generating supporting data for important business decisions primarily requires high accuracy and fairly low fluency: it is also a high stakes use case (high risk if decision was based on wrong data).?
Looking at those use cases, I observed an interesting trend: use cases related to improving creator/workplace productivity (writing a poem, composing music, writing children’s books , creating stock images, writing emails/documents/presentations etc), are less complex/risky and could be better fit for current LLM/Generative AI technology (that is amazing in fluency but still has gaps on accuracy), vs. information seeking/decision support use cases (eg. getting an answer about what appliance/car insurance/vacation etc. to buy, data for important business decisions etc.)
领英推荐
What about monetization?
When prioritizing different use cases, the critical question is which of them offer a) large monetization potential and b) realistic implementation potential (eg. Generative AI technology is mature enough for users to adopt it at scale for this particular use case)?
While I believe that total monetization potential is directly correlated to higher stakes use cases that are much more complex to implement, I think that the current highest ROI opportunities are in several use cases that provide a “sweet spot” of sizable monetization potential and practical implementation opportunity using Generative AI in the short to medium term.
One way to look at it is whether the use case could rely more on “human labeling/correction” at scale: for example, users who use Generative AI to compose documents/emails/presentations, will likely review the draft output and adjust/correct it. This will not only make the Generative AI system better (user feedback further improves the LLM models), but could still introduce a significant (50%-70%) productivity boost that users will be willing to pay a premium for. In this “AI + human” division of duties, AI will be “responsible” for fluent, smooth “story” (that requires significant effort from many users), while humans will be “responsible” for validating accuracy of LLM output. I am sure there are additional use cases that offer good balance of monetization potential and manageable complexity along similar lines: on a flip side, it is not practical to expect humans to fact check every high stakes answer produced by AI (for example, in search engines use case).
We are in early stages of AI Revolution
We live in an exciting (and, to some — rapidly changing and scary) period of human history: a new incarnation of disruptive technology (many compare AI to the invention of electricity or fire!) will impact every aspect of our lives. I consider myself particularly lucky to have both worked on first major AI breakthrough (using deep neural networks on a first-ever product at enormous scale with Google Translate, and on a new generation of AI that can produce natural, human-like outputs for virtually any topic. As with every new and disruptive technology (think early days of railroads, planes etc), productizing and monetizing this groundbreaking AI technology is both exciting and scary, full of complexities and nuances required in order to cross the chasm. I hope that my framework on how to think about and prioritize Generative AI use cases will be helpful to many of you as you embark on this amazing journey.
This is just the first in a series and I look forward to sharing more of my musings with you soon.
------
Barak Turovsky is VP of AI at Cisco, and former Head of Languages AI product teams at Google (2014–2022), focusing on applying cutting edge AI technologies across Google Translate, Search, Assistant, Ads, Cloud, Chrome, and other products. Most recently, Barak was Executive in Residence at Scale Venture Partners and served as Chief Product Officer (responsible for Product, Engineering and AI teams) for Trax Retail, a late stage startup providing Computer Vision AI solutions for the Retail industry.
Previously, Barak was a product leader within the Google Commerce team, worked as Director of Product in Microsoft’s Mobile & Local Advertising, Head of Mobile Commerce at PayPal and Chief Technical Officer for Telemesser, an Israeli startup.
Digital Marketer from Decade | ???? Trainer | Handled $4K budget PPC Campaign | ?? Google Ads Specialist | ??Certified Prompt Engineer | Looking 4 JOB Opportunity on GenerativeAI | Transformers, NLP, COT, Neural Network
3 天前This comprehensive framework is easily understanding and workable Barak Turovsky Thanks for sharing ??
Inherently urgent, never rushing, just getting things done | Ex-Sony, MUFG, EY | Media | Entertainment | Fintech | Technology | Operations
1 周Thank you Barak. This is very insightful and helpful. Much appreciated.
Delivering superlative Gen/AI data foundry services to drive business impact through accelerated deployments.
2 个月I love your treatment here, Barak. While one could quibble and say that its incomplete/over simplified, and that there are many more dimensions that ought to be considered, for me, doing so would be missing the point. It seems that what you're implying is that GenAI is a new imperative. If you haven't embraced it, start someplace feasible & low risk, harvest learnings and expand.
A.I. Writer, researcher and curator - full-time Newsletter publication manager.
4 个月But do you have a Newsletter I can read?
Professor Emeritus of Management and Technology
7 个月We need more frameworks like this. Well done.