Supra转发了
When you launch an AI feature, the user experience isn't defined by your UI design. It's defined by what comes OUT of the model. So why aren't more PMs involved in AI model evaluation? Usually because: ? They think they need deep ML knowledge ? They're intimidated by the technical jargon ? They believe it's "the data team's job" The truth? You need ZERO ML expertise to start evaluating AI outputs. What you DO need is empathy for both users and models. In a recent Supra roundtable, Julia Winn shared these tips for developing model evaluation skills: 1/ Find existing model work in your org. ??Surprise them with: "I'd like to help with your evaluation." ??(Spoiler: They'll be thrilled. Nobody enjoys this work.) ?? ??Don't wait for a formal AI project. Start with something simple like a recommendation engine already in production. 2/ Start with a small dataset. ?? ??Example: For a customer support classifier that detects urgent tickets: ??? Pull 20-30 tickets across categories ??? Compare the model's output against human judgment ?? ??The goal is spotting patterns like "model misses urgency when multiple issues are in one ticket." 3/ Create a simple framework: ??? Strong pass (delivers clear value) ??? Borderline pass (technically correct but weak) ??? Clear fail (would hurt user experience) ?? ??Avoid complex rubrics with multiple dimensions. This simple framework keeps discussions focused on what matters. 4/ Focus on examining the failures. ??Ask: "What context is missing?" not "Why is the model wrong?" ?? ??When a model gets something wrong, it's rarely because the algorithm is flawed—it's almost always missing critical context. 5/ Build your own evaluations for practice. ??Use ChatGPT or Claude to test ideas systematically. ?? ??Try an AI summarizer for your favorite blog. Define what "good" looks like and analyze the patterns. The goal isn't becoming a data scientist. It's developing an intuition for how models perceive your users' world. Where are you currently involved in AI evaluation?