The Challenge of the AI Demo
The AI Demo isn’t easy. Many of the major AI companies have demoed their AI systems, first starting with pre-recorded, & now pushing into live demos. They don’t always work.
Multiply Murphy’s Law by a non-deterministic system & it’s not unreasonable to expect AI demos to nearly always hiccup.
Demo disruptions aren’t disaster. These systems are early & changing rapidly. They might suggest the system requires work & tuning, not a fundamental challenge.
But, they can be problematic in proofs-of concept.
Proofs of concept are extended demonstrations of the software. Well-structured PoCs align on success criteria at the outset. These criteria enable vendors & customers to agree on what success looks like.
Worflow proofs-of-concept are relatively straightforward. They are deterministic. Can I process a loan application in 5 minutes? Yes or no.
But as AI applications shift to selling outcomes implicitly or explicitly, the PoC becomes a testing ground of those outcomes. Non-determinism means sometimes the PoC won’t produce the required wow moment. This also means the PoC criteria must be more flexible.
领英推荐
How does a buyer evaluate a probabilistic system?
Do we compare it to human performance? Speaking to some practitioners, they’ve shared with us human labelers typically agree on 60-70% of the time. Does a AI robot need to be as accurate as a human assuming it will be much less expensive? Or will we expect more as we do in self-driving cars?
If AI systems require human assistance, then the ROI of the system must include some human operating expense - whether explicit or implicit.
Some teams will want to benchmark systems in parallel to determine the relative performance. With most startups building atop existing models & setting aside differences in fine-tuning, the ultimate performance should be relatively comparable, provided they use the same data sets. Will startups compete on access to different data sets?
Today, there are more questions than answers about how to sell AI agent systems. We’re hosting an event on the evening of Sep 10th in San Francisco to interview leaders in the space moderated by Dave Morse, former CRO at Hebbia & VPS/VPCS at ScaleAI to talk about some of these questions.
If you’re interested to attend, see the details here.
Data Science & AI Expert: Advancing Self-Organizing Systems and Artificial Intelligence for Innovative Solutions
1 个月Add to the non-determinism of the AI system that you are demoing the open and often surprising dynamics that a real-time environment throws at you! As we are shifting from pre-trained models to continuously and autonomously executing agents, scripting a demo *with* them takes extensive practice. Nathaniel Green
Data, Gen AI and Agentic AI Science Engineering
1 个月It might be use case based regarding the error acceptance; also, build vs buy is getting obvious regarding enterprise use cases. So in my opinion, for those startups selling solutions, find the niche and market will be a good bet
CEO @ Prodigal | Lending intelligence
1 个月Love this quote in Morgan Housel 's book. That becomes especially true when they need to make a commitment of dollars and/or time. Even then as the industry is going through an early adoption phase (or is it over, already?!) it is almost necessary to work with those who are comfortable with the future and be 100% honest with them.
Tomasz, I like your points here on challenges with live demos. As you begin to discuss proofs-of-concept (PoCs) and measuring success, I would propose what we should be looking at is proofs-of-value (PoVs). My partner, Seth Earley, and I believe this is the path to real impact and confirming opportunity in the AI space. While PoVs are not quick demos, as they require time with the raw data, real data architecture work, and real implementation efforts for the AI itself, they are still short projects that directly demonstrate the success (or failure) of the AI applied to a discreet business problem. Measuring this impact is directly comparable to the process step it replaces.