Beyond ChatGPT Clones: How Top Companies Are Winning with Strategic LLM Evaluations

Beyond ChatGPT Clones: How Top Companies Are Winning with Strategic LLM Evaluations

"I love the new product-market expansion, but can we make sure it’s an AI chatbot instead?” - Annoying Investor in 2023

In response to the global explosion of ChatGPT, there was immense pressure for everyone to be an AI company and a slew of copycat chatbots came out. The vast majority never delivered meaningful value to customers. Now, forward-thinking companies are shifting to hiring AI as a key team member by identifying specific use cases that deliver tangible value to customers and rigorously evaluating them.

Change the Game and Hire AI for a Particular Job

Think of how much planning, effort and deliberation goes into hiring a key member of your team. On average, it takes between 33 and 49 days between applying for a role and starting at the company. You can take this process of rigorous evaluation to the next level for Large Language Models (LLM’s) in meaningfully less time. Imagine instantaneously summoning 5 incredibly qualified candidates and having them sit through a battery of interviews, case studies and coding exams designed to determine their ability to do the job. Now imagine that these candidates are available 24/7 and would love nothing more than to answer any follow-up questions you have. With LLM evaluations, this is the new reality.

OpenAI cofounder and president Greg Brockman states it as

Source: X/Greg Brockman.


The Strategic Importance of Evaluations (Evals)

LLM evaluations offer a structured approach to assess AI capabilities against precise business needs, such as enhancing differentiation, reducing costs, or improving efficiency.

These Evals consist of:

  • A prompt that is used to create an output - such as “Given a user’s profile and recent activity, write a personalized email designed to re-engage the patient in our experience.”
  • A function that runs the prompt through different models with different data to create a data set to be evaluated
  • Evaluation functions that grades the output across different criteria

The different types of criteria include:

  • Programmatic Evals: Quantitative measures like word count or Reading Level.
  • Synthetic Evals: Asking an LLM to evaluate the output and grading it across more qualitative criteria such as sentiment analysis in emails.
  • Human Ratings: Using actual Humans to grade the outputs

Benefits of LLM Evals - Elevating the Conversation

Before LLM evals, your conversations about integrating AI may have sounded like:

I thought AI was going to completely replace all of our Customer Service Agents, how does this do that? Don’t we need Data Scientists to do this for us? How much is this going to cost?

After LLM evals, you’ll start to hear:

We can do this right now? When can we start? It’s interesting that the smaller models have reasonable performance relative to GPT-4 for this use case. I can’t believe it will only cost us this much if we use Mistral!

Evals also bring everyone to the table. Every stakeholder will take something away from this type of analysis and be part of the solution. Finally, by making the Eval’s focused on a particular use case and showing the actual outputs you will change the conversation from, “Should we do this?” to “Can we afford not to do this?”

The Future of Strategic AI Implementation

The shift from indiscriminately building to strategically hiring AI is what the best companies are doing. Stay tuned as we dive further in as we walk through an actual Eval step by step.


James D. Feldman, CSP, CITE, CPIM, CPT, CVP, PCS

AI Transformation Leader in Hospitality | Ex-CEO & Global Speaker | Innovating Guest/Customer Experiences & Employee Performance Optimization

7 个月

Exciting to see how companies are strategically leveraging AI through LLM Evals. ??

回复
Alex Carey

AI Speaker & Consultant | Helping Organizations Navigate the AI Revolution | Generated $50M+ Revenue | Talks about #AI #ChatGPT #B2B #Marketing #Outbound

7 个月

Exciting times ahead in the AI landscape. Matthew Thompson

回复
Pete Grett

GEN AI Evangelist | #TechSherpa | #LiftOthersUp

7 个月

Exciting times ahead in the AI landscape. Can't wait to see how companies benefit from LLM Evals. Matthew Thompson

回复
Udo Kiel

????Vom Arbeitswissenschaftler zum Wissenschaftskommunikator: Gemeinsam für eine sichtbarere Forschungswelt

7 个月

Exciting to see the evolution of AI in business strategies. ??

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了