no. 6 - How bad is good enough?
Over the last few years building AI products, there’s a question that has come up so often and caused so much commotion that it has earned a moniker among a few of my close friends: the Killer Question - because depending on how you answer it, it can either kill your team’s progress or kill the roadblock in your way.
The Killer Question is both philosophical and deeply practical: How bad is good enough? This question is neither abstract nor rhetorical — it encapsulates the core tenet that makes working with AI so different from building traditional products and forces you to ground your beliefs about what a product can be in cold, hard facts.
But where did the Killer Question come from?
The existence of this question in stems from a fundamental truth: AI is inherently non-deterministic. Unlike traditional software, where you can guarantee outcomes with complete certainty (for example, clicking a button here takes a user there), AI involves a level of unpredictability. Simply put, you cannot assure that AI will behave exactly the same way every time or that it will always provide the answer you're hoping for. This uncertainty can be unnerving, especially for those used to building products with lots of “if this then that” logic. But paradoxically, this non-determinism is where the magic comes from.
Take, for example, social media feeds. My social feeds are filled with memes, news, recipes, workout content, and posts from friends, while one of my best friend’s feed is filled with NBA highlights, LA culture, and video game stream clips. The beauty of AI lies in its ability to predict what the best outcome for a situation might be and then bring it into existence. It takes inputs from users (and many other sources) and generates outputs (such as a feed, or even media) in a way that is relevant to each individual and the assigned objective. AI is not following a predetermined path; it is reacting on the fly to new stimuli based on a variety of patterns it has learned along the way—perpetually painting on a blank canvas.
The Problem with “Hallucinations”
Tied to this non-determinism is one of the most misunderstood concepts in AI: hallucinations. When people talk about AI hallucinations, they often say things like, “the model went off the rails,” “it produced something strange,” or “it just made things up”—essentially, outputs that do not match what we expect or consider acceptable. But here is the nuance: if the model is always producing its best predictions based on the data, those outputs are not inherently good or bad—they simply are. It is only when we place them within the context of a product or user experience that we decide what is “good” or “bad.” However, it is important to note that labeling these outputs in a binary, “good” or “bad,” fashion may oversimplify the issue. A more intellectually honest approach might be to ask how much of the output is bad versus good, rather than applying a one dimensional label. One of my favorite AI Engineering leads at LinkedIn, Mehul Rathod , summed this up perfectly:
“All model outputs are hallucinations; it is just that some hallucinations we like more than others.”
Well put, Mehul.
Thinking of all AI outputs as hallucinations helps take the edge off of building with this technology by separating the task (build the logic) from the objective (helping users achieve their goals). Accepting the reality that these hallucinations will happen brings us back to our Killer Question: How bad is good enough?
So, how should we approach it? My team has found success in tackling adjacent questions to triangulate an answer we feel confident about. These include:
Regularly referencing these adjacent questions (and others tailored to the AI use cases we’re working on) has given our team a reliable framework for making consistent, apples-to-apples comparisons between different iterations of the same experience and even across products. Sooner or later, you’ll encounter odd behavior from your AI—perhaps even flagged by a concerned executive. This is when the Killer Question truly earns its name: respond without conviction, evidence, and belief, and your team’s progress is dead in the water. But answer with confidence, proof, and humility, and you’ll find the team and organizations trust in you surge. If you’ve done your homework, you can rely on the testing and measurement framework that gave you the confidence to release the AI in the first place. In moments like this, there’s really only one way to respond to the executive:
领英推荐
Learning from the worst case scenario
Another personal favorite question to consider when triangulating the Killer Question is: How bad is the worst-case scenario? In some products, the worst-case scenario isn’t so bad—take Spotify, for instance. If they recommend a song I don’t like, I just skip to the next one. But in other experiences, the worst case can be far more serious.
Take medicine, for example, where the stakes are so high that even the tiniest mistake could be literally mean the difference between life and death. In this context, the tolerance for bad outcomes is practically zero. Similarly, in legal applications where AI might be used to draft contracts or interpret terms, the margin for error is razor-thin. Imagine a century-long real estate contract being misinterpreted by a faulty AI model—the consequences could be enormously costly and permanent.
While social platforms and consumer tech might sit on the other end of the spectrum entirely, there are many categories that I like to believe fall somewhere in the middle. Take education, for example. AI-driven tools can provide valuable feedback or tutoring, but there’s room for some degree of error. An AI-powered learning platform that occasionally misjudges a student’s ability can be forgiven, as long as it generally guides learners in the right direction - much like the affordance we grant to educators when they make mistakes.
The Subjectivity of AI’s Success
The simple truth is that success in AI is both subjective and complicated. What one person perceives as a bad experience might be seen by another as merely a quirk, a statistical inevitability, or for some, even an improvement. As AI experiences become more common, answering the Killer Question will only become more important and frequent for R&D teams.
I’m willing to bet that the teams that that build tomorrow’s Google and Amazon will be those that can quantitatively and programmatically evaluate the blurry lines between good and bad (whatever those words actually mean). They’ll rigorously monitor, evaluate and update thresholds for “good enough.” This doesn’t mean lowering standards; it means acknowledging the complexity of building systems that, by design, can never be perfect and coming to consensus on a way to measure the subjectivity.
I consider myself incredibly fortunate to work alongside talented individuals like Rachel and Ryan , who have pioneered this area of thinking for our team, upheld the highest standards for product quality, and catapulted our team to the cutting edge of this industry. Thank you both. Together, we have surfed the waves of innovation over the last three years and developed practical answers to the Killer Question that will undoubtedly surface soon enough: How bad is good enough?
Note to Self:
Be both ruthless and honest when testing your tech, its the only way to say “Thanks for your feedback. We hit the success threshold we needed to release, but we will investigate to be sure,” with sincerity and confidence.
For this installment, enjoy a collection of my favorite arabic songs that bring me back to Friday nights playing cards with my childhood friends.
Global Director of Customer Learning @ LinkedIn
2 个月Love this!
AI Product @ LinkedIn | UCLA CS
2 个月Great read! “Imagine a century-long real estate contract being misinterpreted by a faulty AI model—the consequences could be enormously costly and permanent.” This example reminded me of the Mesa Verde deal in Better Call Saul lol.. was that the inspiration ??