no. 6 - How bad is good enough?

no. 6 - How bad is good enough?

Over the last few years building AI products, there’s a question that has come up so often and caused so much commotion that it has earned a moniker among a few of my close friends: the Killer Question - because depending on how you answer it, it can either kill your team’s progress or kill the roadblock in your way.

The Killer Question is both philosophical and deeply practical: How bad is good enough? This question is neither abstract nor rhetorical — it encapsulates the core tenet that makes working with AI so different from building traditional products and forces you to ground your beliefs about what a product can be in cold, hard facts.

But where did the Killer Question come from?

The existence of this question in stems from a fundamental truth: AI is inherently non-deterministic. Unlike traditional software, where you can guarantee outcomes with complete certainty (for example, clicking a button here takes a user there), AI involves a level of unpredictability. Simply put, you cannot assure that AI will behave exactly the same way every time or that it will always provide the answer you're hoping for. This uncertainty can be unnerving, especially for those used to building products with lots of “if this then that” logic. But paradoxically, this non-determinism is where the magic comes from.

Take, for example, social media feeds. My social feeds are filled with memes, news, recipes, workout content, and posts from friends, while one of my best friend’s feed is filled with NBA highlights, LA culture, and video game stream clips. The beauty of AI lies in its ability to predict what the best outcome for a situation might be and then bring it into existence. It takes inputs from users (and many other sources) and generates outputs (such as a feed, or even media) in a way that is relevant to each individual and the assigned objective. AI is not following a predetermined path; it is reacting on the fly to new stimuli based on a variety of patterns it has learned along the way—perpetually painting on a blank canvas.

The Problem with “Hallucinations”

Tied to this non-determinism is one of the most misunderstood concepts in AI: hallucinations. When people talk about AI hallucinations, they often say things like, “the model went off the rails,” “it produced something strange,” or “it just made things up”—essentially, outputs that do not match what we expect or consider acceptable. But here is the nuance: if the model is always producing its best predictions based on the data, those outputs are not inherently good or bad—they simply are. It is only when we place them within the context of a product or user experience that we decide what is “good” or “bad.” However, it is important to note that labeling these outputs in a binary, “good” or “bad,” fashion may oversimplify the issue. A more intellectually honest approach might be to ask how much of the output is bad versus good, rather than applying a one dimensional label. One of my favorite AI Engineering leads at LinkedIn, Mehul Rathod , summed this up perfectly:

“All model outputs are hallucinations; it is just that some hallucinations we like more than others.”

Well put, Mehul.

Thinking of all AI outputs as hallucinations helps take the edge off of building with this technology by separating the task (build the logic) from the objective (helping users achieve their goals). Accepting the reality that these hallucinations will happen brings us back to our Killer Question: How bad is good enough?

So, how should we approach it? My team has found success in tackling adjacent questions to triangulate an answer we feel confident about. These include:

  1. How often do users report dissatisfaction with our product?
  2. How frequently does our AI deviate from expected behavior?
  3. How accurately and directly does our AI respond to user inputs?

Regularly referencing these adjacent questions (and others tailored to the AI use cases we’re working on) has given our team a reliable framework for making consistent, apples-to-apples comparisons between different iterations of the same experience and even across products. Sooner or later, you’ll encounter odd behavior from your AI—perhaps even flagged by a concerned executive. This is when the Killer Question truly earns its name: respond without conviction, evidence, and belief, and your team’s progress is dead in the water. But answer with confidence, proof, and humility, and you’ll find the team and organizations trust in you surge. If you’ve done your homework, you can rely on the testing and measurement framework that gave you the confidence to release the AI in the first place. In moments like this, there’s really only one way to respond to the executive:

  1. Thank the person for their feedback.
  2. Assure them that you’ll investigate.
  3. Share the evidence that demonstrates your AI’s performance was “good enough” to release at first.
  4. Follow up to confirm whether the issue was systemic or one of the inevitable quirks that can occur with AI.

Learning from the worst case scenario

Another personal favorite question to consider when triangulating the Killer Question is: How bad is the worst-case scenario? In some products, the worst-case scenario isn’t so bad—take Spotify, for instance. If they recommend a song I don’t like, I just skip to the next one. But in other experiences, the worst case can be far more serious.

Take medicine, for example, where the stakes are so high that even the tiniest mistake could be literally mean the difference between life and death. In this context, the tolerance for bad outcomes is practically zero. Similarly, in legal applications where AI might be used to draft contracts or interpret terms, the margin for error is razor-thin. Imagine a century-long real estate contract being misinterpreted by a faulty AI model—the consequences could be enormously costly and permanent.

While social platforms and consumer tech might sit on the other end of the spectrum entirely, there are many categories that I like to believe fall somewhere in the middle. Take education, for example. AI-driven tools can provide valuable feedback or tutoring, but there’s room for some degree of error. An AI-powered learning platform that occasionally misjudges a student’s ability can be forgiven, as long as it generally guides learners in the right direction - much like the affordance we grant to educators when they make mistakes.

The Subjectivity of AI’s Success

The simple truth is that success in AI is both subjective and complicated. What one person perceives as a bad experience might be seen by another as merely a quirk, a statistical inevitability, or for some, even an improvement. As AI experiences become more common, answering the Killer Question will only become more important and frequent for R&D teams.

I’m willing to bet that the teams that that build tomorrow’s Google and Amazon will be those that can quantitatively and programmatically evaluate the blurry lines between good and bad (whatever those words actually mean). They’ll rigorously monitor, evaluate and update thresholds for “good enough.” This doesn’t mean lowering standards; it means acknowledging the complexity of building systems that, by design, can never be perfect and coming to consensus on a way to measure the subjectivity.

I consider myself incredibly fortunate to work alongside talented individuals like Rachel and Ryan , who have pioneered this area of thinking for our team, upheld the highest standards for product quality, and catapulted our team to the cutting edge of this industry. Thank you both. Together, we have surfed the waves of innovation over the last three years and developed practical answers to the Killer Question that will undoubtedly surface soon enough: How bad is good enough?

Note to Self:

Be both ruthless and honest when testing your tech, its the only way to say “Thanks for your feedback. We hit the success threshold we needed to release, but we will investigate to be sure,” with sincerity and confidence.


For this installment, enjoy a collection of my favorite arabic songs that bring me back to Friday nights playing cards with my childhood friends.


Stephanie Hurst

Global Director of Customer Learning @ LinkedIn

2 个月

Love this!

Shashank Venkat

AI Product @ LinkedIn | UCLA CS

2 个月

Great read! “Imagine a century-long real estate contract being misinterpreted by a faulty AI model—the consequences could be enormously costly and permanent.” This example reminded me of the Mesa Verde deal in Better Call Saul lol.. was that the inspiration ??

要查看或添加评论,请登录

Zane Homsi的更多文章

  • no. 5 - Observations on AI Agents and predictions for the future

    no. 5 - Observations on AI Agents and predictions for the future

    Let’s try something different with a brief warm up quiz. What do James Bond, Expedia, Carol Stills & Bob Parr (a.

    1 条评论
  • no. 4 - the 5 levels of AI and the things that won’t change

    no. 4 - the 5 levels of AI and the things that won’t change

    A few weeks ago, Nitin, a friend and mentor, sent our AI study group the following image and another responded with, “I…

    3 条评论
  • no. 3 - The First Writer's Advantage

    no. 3 - The First Writer's Advantage

    In 1998, while working at Netscape, Ben Horowitz wrote one of the most timeless and influential essays that so simply…

    6 条评论
  • no. 2 - Everything you need to understand web3

    no. 2 - Everything you need to understand web3

    Read Time: 13.5 min The newsletter Catching Z's has been renamed to Notes to Self.

    29 条评论
  • no. 1 - Introducing “Catching Z’s”

    no. 1 - Introducing “Catching Z’s”

    The newsletter Catching Z's has been renamed to Notes to Self. ?? hey there, thanks for stopping by! It has been a…

    20 条评论
  • Becoming a Coffee Chat Champion

    Becoming a Coffee Chat Champion

    Why you should read this post: In the five minutes it will take you to read this you will learn an actionable method &…

    26 条评论
  • Crushing Your Interviews

    Crushing Your Interviews

    With summer internships quickly wrapping up and a new season of recruiting beginning, many of my younger friends have…

    15 条评论
  • The 8 Most Valuable Lessons I Learned in College

    The 8 Most Valuable Lessons I Learned in College

    After 38 final exams, 63 papers, thousands of hours on Spotify, and countless memories with some of the best friends…

    16 条评论

社区洞察

其他会员也浏览了