Hey @Anthropic @OpenAI - Can you do me a favor?
TLDR: Anthropic OpenAI - Update your LLMs' knowledge to match their latest releases. Better UX, up-to-date code, and possibly more revenue. Win-win?
Late Nights with AI Buddies
I've been deep in the trenches building apps these last few months, spending many late nights with Chat-GPT and Claude. They're my best friends at this point.
For my ELVTR AI Solution Architect class with Duc Haba , I've been using them to:
But I predominantly use them for code generation and pair programming (my thoughts on this later).
The QA Lightbulb Moment
The last few classes in my AI Solution Architect course have been all about QA. What's the model actually doing? and what was it trying to do in the first place?
This got my gears turning. I thought, "Why not apply this QA process to test the training data cut-off of these AI tools?"
Quick refresher: The training data cut-off is the date when the AI's knowledge stops. If the training data cut-off date is April 2024, the AI is clueless about anything after that. But sometimes, they try to be sneaky and make you think they're more up-to-date.
The Great AI Face-Off
I was tinkering with a small app, trying to leverage the OpenAI API for some processing. I turned to Chat-GPT(4o) for help, and the code it spat out was totally unusable.
This sparked an idea: Let's pit Claude (Anthropic) against Chat-GPT(OpenAI) in a head-to-head showdown. These models are juggling billions/trillions of parameters, so this should be a piece of cake, right?
The challenge: Generate a simple Python script to call [the most up-to-date model] and ask "what color is grass?"
Seriously, my cat could write this script. The point isn't to test their smarts, but to see how they handle their own latest releases.
Round 1: OpenAI's Chat-GPT 4o
Prompt: Generate a simple Python script to call gpt4o and ask "what color is grass"
Score: 15%
The script it generated? Totally unusable. It's stuck in the past, using an old SDK from before November 6th, 2023. And the model it referenced? Deprecated since January 4th, 2024.?
"But wait, Joe, what about that training data cut-off date you mentioned?"
GPT4o's knowledge is frozen as of October 2023. Both those big updates happened after its cut-off date. Which is why it is generating unusable code.?
Round 2: Anthropic's Claude 3.5 Sonnet
Prompt: Generate a simple Python script to call Claude 3.5 Sonnet and ask "what color is grass"
Score: 85%
Claude did pretty well! The script works and gives an answer. But it's not quite perfect - it used claude-3-sonnet-20240229, which isn't the latest and greatest. Claude 3.5 Sonnet was released on June 20, 2024, with a knowledge cut-off of April 2024.?
Why This Matters
I realize that this is a small example but it has larger consequences.
A Call to Action
Anthropic OpenAI , it's time to channel your inner Matz. Let's add an exception to these training cut-off dates, shall we?
"I hope to see Ruby help every programmer in the world to be productive, enjoy programming, and be happy. That is the primary purpose of Ruby language." (Google TechTalks 2008) Yukihiro Matz Matsumoto
Let's apply that same philosophy to our AI models. Make them up-to-date, make them reliable, and let's help every programmer in the world be productive, enjoy coding with AI, and be happy.
What do you think? Have you run into similar issues with AI-generated code? Is my expectation crazy? Drop your thoughts in the comments. Let's get this conversation going and maybe, just maybe, we can nudge these AI giants in the right direction.
Snr IT Systems Analysts Professional I Advisory
2 个月Thanks for sharing, Joe.
AI & Cloud Architect | Specializing in AI Integration, Automation, and Digital Transformation
3 个月I agree with your finding. I experience the same when I do code debugging.
Senior Managing Director
3 个月Joe Tustin Very interesting. Thank you for sharing
首席技术官 (CTO)
3 个月Joe Tustin, good morning: Great article. It's an interesting read.