Is Generative AI The Next Tech Bubble?
Vin Vashishta
AI Advisor | Author “From Data To Profit” | Course Instructor (Data & AI Strategy, Product Management, Leadership)
Clients have been asking me a valid, uncomfortable question. “Is Generative AI all hype and vaporware?” Google’s questionable Gemini demo didn’t help, but C-level leaders started asking this question in October. Social media is filled with breaking AI news, prompt engineering, and Generative AI images. What CxOs don’t see are the products that have been promised.
Generative AI applications like ChatGPT started out feeling groundbreaking, but most CxOs expected the potential to turn into tangible business applications. They were promised massive productivity gains. Have they materialized? Well, possibly.
In 2022, productivity growth came from increases in hours worked, meaning workers were less productive. That trend reversed (for non-farm corporate roles) in the Spring of this year—increases in labor productivity drive increases in output. Multiple factors are likely at play here, but the timing is interesting.
We could be seeing the start of a Generative AI productivity cycle. McKinsey estimates that Generative AI will add $2.6-$4.4 trillion to the economy, accounting for between 15% and 40% of AI’s total economic impact. However, if that’s the case, where are the Generative AI products that would drive the productivity cycle?
According to an October Gartner poll, 45% of businesses are piloting Generative AI, and another 10% have delivered it to production. In Gartner’s March poll, only 15% were piloting, and 4% were in production. Interestingly, both poll numbers follow what’s become a well-established trend line. 21% of those working with Generative AI in March and 18% in October had successfully delivered to production. The 1 in 5 success rate seems to be holding, which is another problem.
Generative AI is fighting a two-front battle. What’s getting the most attention often underperforms, and what’s performing well isn’t getting much attention. The fanfare about specific model releases takes most of the air out of the room. Products that are available to the public, like Bard, Grok, and ChatGPT, don’t seem to be making much progress. Each has a plan to differentiate, but none have been able to execute yet.
The Benchmark Battle Isn’t Moving Businesses Forward
I understand that Gemini’s benchmarks are better, but Generative AI winners won’t be decided by benchmarks. That’s how models win Kaggle, not how products win over customers. Users don’t care about a 3% increase in this benchmark or an 8% boost on that one. If they don’t see the difference, they don’t know there is one.
I said that on LinkedIn after running the new Bard against ChatGPT. Benchmarks must be connected to business value, customer experiences, or improved outcomes. I did not see a clear differentiation between the models.
Please list all of the US cloud providers with at least $1 billion in annual revenue.
Bard listed 11 companies. ChatGPT returned 21. Both were accurate, but ChatGPT was more complete despite missing some companies. This was the only area I could point at a clearly better answer, so I went back and asked the question again to confirm my results.
Slight variations in the question delivered different winners. Sometimes, Bard delivered more companies, and sometimes ChatGPT did. I just happened upon the perfect framing for ChatGPT vs. Bard in my initial prompt.
Which LLMs are available on Amazon Bedrock?
ChatGPT listed Llama 2, Claude 2.1, Titan, AI21 Labs, Cohere, and Stability AI. Bard Listed Titan, Jurassic-2, the Hugging Face Hub, Stability AI, and Claude. They were mostly the same answers with different framing. Bard caught the Hugging Face Hub, and ChatGPT didn’t.
What are the top 25 models on Hugging Face?
ChatGPT returned the top 25 models by the default ‘trending’ sort but said they were sorted by downloads or popularity. Bard said it was difficult to answer that question because multiple criteria could be selected to define ‘top.’ It returned 5 results for each of 4 different sort options. Every list was inaccurate, and many of the metrics associated with the models (downloads, likes, etc.) were also inaccurate.
What do I need to do differently when baking a Turkey at 5000 feet vs. sea level?
Both Bard and ChatGPT did well with this question. They delivered multiple suggestions and touched on all the main points, like preventing the turkey from drying out, cooking it for longer, and using a meat thermometer instead of cooking time alone. The capitalization of Turkey didn’t throw either one off.
Please explain the many-body Schrodinger equation in simple terms.
Both explained the equation without getting too technical. ChatGPT broke the explanation into summary sections explaining the equation, its applications, and why it’s important. Bard took a different approach but delivered the same content in a different section layout.
What’s interesting is that Bard used metaphors. The first was excellent.
“Imagine trying to describe the dance of a swarm of bees, each bee following its own path influenced by the others. That's what the many-body Schr?dinger equation attempts to do on a microscopic level.”
Other metaphors were not as helpful…
“It's like a recipe for predicting the particle's behavior.”
“It's like writing one grand recipe that somehow accounts for every possible interaction between all the ingredients in the swarm of bees.”
领英推荐
I asked both models to summarize a 300-word document into 5 bullet points. Bard returned 5 bullet points with a single supporting sentence for each. ChatGPT created paragraph summaries to support its bullet points. Bard did a better job understanding the context of the request.
Bard doesn’t support image generation yet, so there’s a big gap in functionality that didn’t make much sense, given the focus on Gemini’s multimodal capabilities. Then, the big reveal came out that the demo was…generated.
Standalone LLMs are a novelty and it’s starting to wear off. What’s underneath the LLM matters most. The company that figures out how to differentiate itself wins, and there are plenty of missed opportunities.
It’s A Pretty Big Grokking Problem
Grok is really bad and repeats mistakes that most Generative AI products make. Generative AI is the horizontal platform. The vertical feature depth makes or breaks it as a product vs. novelty. Grok should be the gateway to Twitter/X. That’s its vertical depth and the functionality that would make it a super app.
I asked Grok to recommend new people to follow based on the people I have alerts enabled for. The LLM doesn’t have access to that information, so it makes the answer up. I asked Grok to summarize an image and the content of a tweet from my bookmarks. Again, nothing because there’s no access. Grok has a lot of unrealized potential.
Based on my queries and engagement, it should deliver daily recommendations for accounts and hashtags to follow. I should be able to track content through Grok during an event or news story and create advanced content filters. Grok should be a new, personalized search engine for X. Give it access to bookmarks, lists, and alert settings so it can recommend accounts and content when I ask.
Grok should be an assistant that makes creating and implementing ad campaigns easier. It has access to trends across the platform. Giving brands access to a tool that identifies emerging trends and helps create content to take advantage of them in near real-time is another missed opportunity.
Those are examples of vertical depth that leverages X’s existing functionality with Grok as the access layer. By failing to take advantage of the X platform’s strengths, the LLM fails to differentiate itself from Bard or ChatGPT. Being edgy isn’t enough to keep users engaged.
Overlooked Solutions
One company delivered a masterful marketing campaign on social media the day after Gemini was released. The company’s executive leadership team went on social media to talk about real features and the business value they were creating for customers. We absolutely need more of this to get attention focused on viable Generative AI products.
SAP’s Joule isn’t advertising benchmarks and it’s difficult to even figure out which LLMs are running under the covers. SAP has only said they use the best model for the given scenario. It looks like the company has implemented an ensemble of purpose-built LLMs to power Joule and other parts of the platform. SAP is working with IBM, Cohere, and Anthropic, so it’s safe to assume all three have been integrated into the platform.
The LLM isn’t a product; it’s an orchestration and access layer built to simplify things like onboarding new users or performing complex workflows. When users ask Joule a question, the Generative Agent looks through the company’s data for answers. The LLM’s knowledge graph isn’t the product or the source of truth. It’s what supports Joule’s ability to go out and find an answer.
With Bard, Bing Search, and ChatGPT + Bing, the LLM runs an internet search to deliver answers to some questions. It’s a similar paradigm, but the internet is unreliable, and the search results are only as good as the query. The answers are only as reliable as the websites returned. When I asked ChatGPT about the US cloud providers with over a billion in annual revenue, on one pass, it ran an internet search for the top cloud providers. It searched one result that was “the top 10 cloud providers of 2022.” The resulting response was predictably bad.
When LLMs are the access layer to a knowledge graph, the results depend on how well the LLM has been built to find answers. Most LLMs are built to deliver answers without external sources. That’s the paradigm most businesses are thinking in while Joule is a better example of Generative AI’s value.
Joule is the SAP platform’s horizontal breadth, providing a single access point. The platform’s apps, data, and models are the vertical depth and high-value functionality. This is where Grok has untapped potential and a blueprint for Generative AI products.
LLMs must provide access to something because they have very little value on their own. An LLM’s knowledge graph enables it to deliver access and orchestration more like an assistant. However, the knowledge graph is rarely complete enough to be the vertical depth alone.
If LLMs Aren’t Valuable Alone, How Does Generative AI Deliver $2.6-$4.4 Trillion?
One of Joule’s boring-sounding features is users can do Q&A on complex documents they upload into the system. In my AI Product Management Certification course, one of my pricing strategy use cases is for an AI product that extracts mortgage data from documents. LLMs support these tasks with high reliability.
One approach to pricing AI products is to estimate the product’s ROI for the target market. I researched the loan processing workflow, and the total cost savings of reducing the manual document analysis parts from 2 hours to 30 minutes is between $250-$350 million annually in the US alone. Automating document data retrieval for a range of use cases will save businesses billions every year.
The 15% of companies with LLMs in production focus on these use cases. I am working with a retail client to implement Generative Search into its website. The company loses millions in monthly revenue to low-quality search results and abandoned search workflows.
LLMs enable more categories and tags to be assigned to each product automatically based on images, product documentation, and customer reviews. Generative Search lets users build more complex descriptions of what they want and refine results by providing feedback. The first feature is already running, and the second will launch early next year.
These types of LLM applications don’t get many likes on social media, but they deliver significant revenue and cost savings. Businesses that get sucked into chasing futurism and hype will get Generative AI vaporware. For that group, LLMs are a complete flop. Businesses that use LLMs as an access layer will get a share of that massive pie.
?
I’ll let you in on a secret. I host the largest set of data and AI strategy, product management, and leadership content (over 500 articles) in a different place. I post selections here months after I release them. Be first in line for my best insights.
Senior Digital Marketing Specialist- Data Dynamics
7 个月#GenAI does seem to have bubble-like hype in 2024. However, I believe if companies build and use these large language models responsibly, with ethical data practices and strong data security, GenAI could positively transform industries. But we have to ensure #AI progresses down an ethical, secure path for all.?
90K | Director/ Artificial Intelligence, Data & Analytics @ Gartner / Top Voice
7 个月Yes, in my humble opinion, there is a big risk in a Generative AI tech bubble. That DOES NOT mean that there aren't profits to be had in GenAI, and it does not mean that many companies won't make profits. What it does mean is that we don't know enough yet to make wise decisions in terms of which firm is which.
Digital Transformation Consultant | MBA in Marketing & Analytics
7 个月Certainly not - if your betting on Microsoft and co. Arguably some of these ventures wont gain much steam and could be resemblant of a bubble - but the tech overall - and the key players like Microsoft - they are like the brand at target - on the Up&Up.
U2 LAB - Smart Tech Solutions?
7 个月Very interesting! ???
AI Lead at BASICO | Podcast Host: The Only Constant | Digital Thought Leader | Public Speaker | IT Strategy | Intelligent Automation
7 个月“Generative AI is fighting a two-front battle. What’s getting the most attention often underperforms, and what’s performing well isn’t getting much attention. “ ????