Hey @Anthropic @OpenAI - Can you do me a favor?

Hey @Anthropic @OpenAI - Can you do me a favor?

TLDR: Anthropic OpenAI - Update your LLMs' knowledge to match their latest releases. Better UX, up-to-date code, and possibly more revenue. Win-win?


Late Nights with AI Buddies

I've been deep in the trenches building apps these last few months, spending many late nights with Chat-GPT and Claude. They're my best friends at this point.

For my ELVTR AI Solution Architect class with Duc Haba , I've been using them to:

  • Create alternative viewpoints for papers and reports

  • Check and challenge my logic
  • Grade my answers based on the intended audience

But I predominantly use them for code generation and pair programming (my thoughts on this later).


The QA Lightbulb Moment

The last few classes in my AI Solution Architect course have been all about QA. What's the model actually doing? and what was it trying to do in the first place?

This got my gears turning. I thought, "Why not apply this QA process to test the training data cut-off of these AI tools?"

Quick refresher: The training data cut-off is the date when the AI's knowledge stops. If the training data cut-off date is April 2024, the AI is clueless about anything after that. But sometimes, they try to be sneaky and make you think they're more up-to-date.

Asked on 7/24/24 - naughty naughty

The Great AI Face-Off

I was tinkering with a small app, trying to leverage the OpenAI API for some processing. I turned to Chat-GPT(4o) for help, and the code it spat out was totally unusable.

This sparked an idea: Let's pit Claude (Anthropic) against Chat-GPT(OpenAI) in a head-to-head showdown. These models are juggling billions/trillions of parameters, so this should be a piece of cake, right?

The challenge: Generate a simple Python script to call [the most up-to-date model] and ask "what color is grass?"

Seriously, my cat could write this script. The point isn't to test their smarts, but to see how they handle their own latest releases.


Round 1: OpenAI's Chat-GPT 4o

Prompt: Generate a simple Python script to call gpt4o and ask "what color is grass"


  • Understands the request: ?
  • Generates Script: ?
  • Generates Useable Script: ?
  • Returns an answer: ?
  • Generates Script using most up-to-date model: ?

Score: 15%

The script it generated? Totally unusable. It's stuck in the past, using an old SDK from before November 6th, 2023. And the model it referenced? Deprecated since January 4th, 2024.?


"But wait, Joe, what about that training data cut-off date you mentioned?"

GPT4o's knowledge is frozen as of October 2023. Both those big updates happened after its cut-off date. Which is why it is generating unusable code.?



Round 2: Anthropic's Claude 3.5 Sonnet

Prompt: Generate a simple Python script to call Claude 3.5 Sonnet and ask "what color is grass"


Grass is typically green in color. The green pigment in grass blades is chlorophyll, which allows plants to absorb sunlight for photosynthesis.


  • Understands the request: ?
  • Generates Script: ?
  • Generates Useable Script: ?
  • Returns an answer: ?
  • Generates Script using most up-to-date model: ?

Score: 85%

Claude did pretty well! The script works and gives an answer. But it's not quite perfect - it used claude-3-sonnet-20240229, which isn't the latest and greatest. Claude 3.5 Sonnet was released on June 20, 2024, with a knowledge cut-off of April 2024.?



Why This Matters

I realize that this is a small example but it has larger consequences.

  1. It's a barrier for new users trying to get into these tools.
  2. It chips away at our confidence in these otherwise amazing creations. If I can't trust it to tell me the color of grass, how can I trust it with the big stuff?
  3. It's a roadblock for integrating these tools into our codebases with their full, up-to-date superpowers.


A Call to Action

Anthropic OpenAI , it's time to channel your inner Matz. Let's add an exception to these training cut-off dates, shall we?

"I hope to see Ruby help every programmer in the world to be productive, enjoy programming, and be happy. That is the primary purpose of Ruby language." (Google TechTalks 2008) Yukihiro Matz Matsumoto

https://auth0.com/blog/a-brief-history-of-ruby/

Yukihiro Matz Matsumoto


Let's apply that same philosophy to our AI models. Make them up-to-date, make them reliable, and let's help every programmer in the world be productive, enjoy coding with AI, and be happy.

What do you think? Have you run into similar issues with AI-generated code? Is my expectation crazy? Drop your thoughts in the comments. Let's get this conversation going and maybe, just maybe, we can nudge these AI giants in the right direction.












Kelvin Michael, CISA, CRISC, M.Sc

Snr IT Systems Analysts Professional I Advisory

2 个月

Thanks for sharing, Joe.

回复
Priti Solanki

AI & Cloud Architect | Specializing in AI Integration, Automation, and Digital Transformation

3 个月

I agree with your finding. I experience the same when I do code debugging.

Woodley B. Preucil, CFA

Senior Managing Director

3 个月

Joe Tustin Very interesting. Thank you for sharing

Duc Haba

首席技术官 (CTO)

3 个月

Joe Tustin, good morning: Great article. It's an interesting read.

要查看或添加评论,请登录

社区洞察