Generative AI: 2023 Recap and 2024 Predictions
Generative AI is perhaps the most discussed topic last year. As we wrapped up 2023 and are welcoming the new year 2024, what better time than to reflect on the past and look forward? This can inform some of your new year resolutions as well ??. Let us dive in!
2023 Recap
The past year has been a wild ride with significant activity in Generative AI. Several models are now available [see Figure 1 in appendix], and innumerable startups have mushroomed, building generic and niche products. No matter how we view this, the growth and action have been unprecedented.
Chat GPT and Other Commercial Players
It all started with Open AI’s Chat GPT which has garnered 100M users within five days of its launch. However, although Open AI created a new category and Microsoft “started the dance” with Bing Chat (now Copilot), they surprisingly could not dent Google’s search traffic. Google was again the most popular general Internet service [2]. OpenAI was the most popular service in the emerging Generative AI category, but this was not sufficient for it to enter the top 100 in terms of traffic [3].
Google released Gemini with Nano and Pro available from mid-Dec. But this is, at best, an incremental improvement, if at all. They are also hedging by investing in Anthropic AI. And so is Amazon.
NVIDIA was the biggest winner, as Generative AI models are very GPU-hungry.
Open Source Large Language Models (OSS LLMs)
Kickstarted by Llama and then accelerated by Llama2, OSS LLMs have been maturing quite a bit [Figure 2]. The most recent models, like Mixtral 8x7B perform at par with GPT-3.5 Fig 3, which is a fantastic achievement.
It took about ten months for OSS models to perform at par with GPT-3.5
The recent LLMs have been small-ish (Mistral – 7B, Mixtral – 12B, Phi2 – 2.7B), performing at par or slightly better than 70B models. This makes OSS models quite amenable to broad-scale local usage, including on phones.
The OSS community still needs to figure out how to beat GPT-4. If we look at the Elo rating, there is still a 122-point gap between the best OSS model (Mixtral 8x7B) and GPT-4 [Figure 3]. This gap translates to a 67% win rate of GPT-4 in the head-to-head setting.
Generative AI Capabilities
LLMs are now understood to be a compression of documented human knowledge. They are not databases, i.e., they are not deterministic in their answers but probabilistic. They are approximate databases that mimic patterns in text rather than knowing anything.
Retrieval Augmented Generation (RAG) has emerged as the workaround, combining search and LLMs to generate the answer. This technique reduces hallucinations and enables users to verify the answers through citations.
Context length has grown from 4k tokens to 200k tokens. However, this increased token length has mixed outcomes, with a “lost in the middle” problem [7].
Alignment is still an open problem. We still see several jailbreaks (data leakage, security breaches, etc.). Anecdotal evidence suggest that improving alignment has an inherent trade-off with the model utility [8].
?Business Impact
For enterprises, Q1 and Q2 have been about education on LLMs. In Q3, they set up bare-bones infrastructure and access LLM APIs. In Q4, we started seeing POCs.
LLM usage data indicates primary uses tend to be role-play, story generation, code generation, etc. [Figure 4]
There are broadly three usage patterns –
From a business point of view, executives see productivity gains as a significant value addition, and they anticipate workforce reduction [Figure 5] as a result.
LLMs have shown significant benefits in systematic studies conducted on creative writing [Figure 6] and call center case handling [Figure 7].
CDOs are looking for better guard rails, data governance (text data is now accessible through vector databases, but these are pretty different from relational databases), and security [Figure 8].
Regulation
Regulation has started with the US WH EO and EU’s AI Act. These are just preliminary attempts and lack the necessary nuance and detail. Several lawsuits are ongoing in courts ranging from text to image models, with the latest being NYT vs. MSFT/Open AI.
To ease the risk aversion among enterprise customers, Open AI, Microsoft, Google and Anthropic AI now provide legal indemnity for their customers.
2024 Predictions
Looking at the trends across research, scaling and enterprise adoption, the generative AI juggernaut doesn’t seem to have a slowdown anytime soon. Investments are continuing to pour in to establish generative AI platforms and apps that target worker productivity gains and creative tasks. That said, we don't foresee any breakthroughs this year (like GPT-4), but there will be steady incremental improvements. Let us look at what we can expect this year
?
1.?Retrieval Augmented Generation (RAG) will mature to become mainstream and power most use cases in enterprises. Most of the current proof-of-concepts (POCs) are using simple retrievers based on cosine similarity, which fall short for several use cases. While there are advanced retrievers, there are too many narrow retrievers. We expect a consolidation across these retrievers and then development of orchestrators that will automatically pick the right combination of retrievers based on use case. This will eventually drive the replacement of Enterprise Search apps with RAG based Enterprise Answers apps.
?
2.?Multi-modality will be the new frontier, evolving chat bots to assistants that can see, listen, and talk back. Multi-modal learning has become necessary to improve the effectiveness of AI and thus emerged as the new frontier. We expect there will be a major focus on multi-modality from all the generative AI model developers. With steady improvements, we expect multi-modal performance on benchmarks like MMMU, will improve to 80% (currently 59.6%). Remember that “fake” Gemini Ultra demo? It will become real!
?
3.?A 10B OSS model will perform at par with GPT-4, enabling ubiquitous local deployment. Small-ish models like Mixtral, Solar, Phi-2 have been punching way above their weight. So far, RLHF has been the primary limitation for the OSS community because it is highly expensive to collect the volume of data that is required. However, techniques like self-play, sample-efficient finetuning have led to smaller OSS models performing at par with proprietary models. Once reached, this milestone will make generative AI models ubiquitous because they can be deployed on any device owing to their small size. Local AI will become one of the major patterns.
?
4.?There will be a wave of wearables powered by LLMs. Most of them will fail. Software space is already crowded, while the hardware is still largely mobile phones. We expect there will be significant investments into imagining and developing new types of generative AI based devices. However, we anticipate they will face severe headwinds pertaining to privacy, security, safety, and sub-par UX. Particularly, UX will require several iterations to strike gold, if any.
?
5.?GPT marketplace will be a flop. We anticipate GPT marketplace will also be hosted in Open AI cloud in SaaS mode, just like GPT models. This can be viewed as a spin-off of the plugins that were shut down recently. Enterprises won't use it due to data security concerns. Consumers won't be able to navigate through the marketplace to buy GPTs (similar to Alexa skills).
?
6.?Domain-specific reasoning and planning AI will trigger yet another wave of enterprise adoption. Generative AI models are still far from achieving reasoning and planning capabilities. There doesn’t seem to be a path that will lead them to that anytime soon. However, they will blend with traditional planning and simulation software and learn domain-specific reasoning and planning capabilities with self-play.
?
7.?Enterprises will start leveraging OSS models and will develop custom LLMs with their data. Most of the enterprise adoption so far has been for proprietary models, like Open AI, AWS Bedrock, etc. But the adoption has been somewhat stunted primarily because of concerns around data security and privacy. Once the OSS models will cross the GPT-4 threshold, we expect to see a significant shift towards OSS models. This will also drive finetuning with in-house data to develop custom models.
?
8.?Enterprises will start realizing productivity gains from generative AI
?
9.?Existential risk voices will quieten, and focus will shift to regulation. AI Act will be passed in the EU. [It is more regulation-oriented than the Exec Order in the US]. Institutions like NIST and OECD will develop standards around risks involved in generative AI models and standards for data used for training these models. Overall, regulation will favor content creators more than model developers.
?
领英推荐
10.?Alignment will continue to be a tough nut to crack, as it needs new breakthroughs to overcome the trade-off against utility, and there is yet to be any in sight. We have to live with the so called alignment tax New companies and jobs will emerge around teaching LLMs for alignment – data curation, original data creation, etc.
?
We are thrilled by the prospects of Generative AI harmonizing innovation with governance to enrich everyone's experience. Here is to another year of a great generative AI ride!
Note: Although the title is Generative AI, for the purpose of this article our focus is mostly on LLMs. They represent the biggest share anyway ;)
Copyright @ Srinivas Chilukuri & Arun Shastri, 2024. Any illegal reproduction of this content will result in immediate legal action
?
Authors:
Srinivas Chilukuri is a Principal at ZS. He helps clients implement generative AI solutions.
Arun Shastri is a Principal at ZS. He helps clients transform their digital capabilities.
References
2.????? Cloudflare 2023 Year in Review
5.????? Lmsys Chatbot Arena Leaderboard
6.????? Lmsys Elo update 07-Dec-2023
7.????? Liu et al, Lost in the Middle: How Language Models Use Long Contexts
10.?? Noy et al, Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence
Appendix: Illustrations
? Figure 1. Most Remarkable Generative AI Releases of 2023 [1]
Figure 2. MMLU Performance Trajectory of Private and Open source LLMs [4]
?Figure 3. Bootstrap of MLE Elo Estimates [5]
?Figure 4. Distribution of topics from user interactions with LLMs [6]
?Figure 5. Responses to “How will generative AI affect your company?” [9]
? Figure 6. Productivity of Chat GPT on writing tasks [10]
Figure 7. Productivity of LLMs on call center case handling [11]
Figure 8. Key challenges with Generative AI according to CDOs [12]
Chief Revenue Officer @ Opus 2 | Board Member @ Peruvian Partners | Growth Executive, Advisor, & Investor
1 年Great post - thanks Srinivas. As Gen AI / LLM become more ubiquitous, which we likely all agree is inevitable (debate solely being around which forms it takes), I wonder about how the customer / consumer will respond. Will we act differently knowing that a large portion of what we interact with is initially generated by a machine? My guess is that in certain situations and contexts, no. But in others, for sure yes.
Great article Srinivas Chilukuri and Arun Shastri. Curious how these insights might translate into a Generative AI Maturity Model and secondly, the relative impact of these predictions by vertical? For example number 8 around "CRM will be disrupted with AI agent becoming the first "person" customer talks to, replacing IVR, routing only complex queries to human agents." That is a big prediction with significant implications for retail, hosptiality and other customer-facing verticals. Finally, would look forward to learning more around how emerging advances in supercomputing, edge-based computing and broadband (6G) will help or hinder these developments.
Managing Partner, Client Services
1 年Great reflections and predictions, especially, “There will be a 5x reduction in turn-around time for an analytics question.” Imagine that impact! We are forrtunate to have Srinivas Chilukuri and Arun Shastri leading our Gen AI innovation at ZS!
Leading Global Data Director with expertise in Technology Leadership, Data Platforms, AI and analytics
1 年Wonderful read and a great round up !
#8 for sure!