登录查看更多内容

AI is getting dumber!

Martin Bechard

Consultant | Problem Solver | Software Developer | Technical leader

发布日期: 2025年3月19日

A few friends of mine have been talking about how models are starting to fall apart with the new "Reasoning" training techniques. Vendors say it's because the new models are stronger in some areas, weaker in others. That doesn't make it alright for me.

Today I was trying to get Claude Sonnet 3.7 to fix the typescript errors it had created generating a test fixture:

First of all, it looks like it was just spitting out random code given the amount of different issues in this one file. And what's the point of AI if it's can't apply typescript rules better than humans? (And I look forward to AGI so that the problem of finding a missing closing parentheses can finally be solved...) Now I don't believe in Artificial Intelligence, just Apparent Intelligence. Nevertheless I don't relish spending time fixing typescript errors, especially as my "assistant" created them.

So I posed the question to 3.7, Anthropic's "Most intelligent model yet!"

I decided to ask 3.7, in its lofty intelligence level, to sort this out. Part of me suspected that this might just be a dummy file needing to be deleted, considering all the errors, so I ask for confirmation that this is indeed a real test.

This has got to be one of the laziest messages I have received from an AI. I can plainly see that the symbol doesn't appear to be on the type, on line 581. Then it just makes some vague guesses. Then esentially tells me good luck in fixing it.

I was peeved and sent the ball back in its court:

You got that right, bud! You break it, you fix it!

I configured an MCP server to let it access my project files as required so it goes ahead and reads files.

Unfortunately, the response is completely bogus, blaming the typescript errors on a glitch in the typescript server (used by VS code to convert typescript on the fly to javascript). While I've had a problem with the typescript server once or twice in the past, it's really a leap to jump immediately to that. And telling me to look into the tsconfig.json is a great way for me to waste hours trying different typescript options. All unecessarily of course.

I had heard that Claude 3.5 was actually better for coding, so I decided to switch models and see if the rumor was true with this simple test.

From the same prompt, it gets the problem RIGHT AWAY!!!

It continues:

So there we have it: a simple problem, but where 3.5 went straight ahead and solved the problem, the lethargic 3.7 gave me a bunch of phoney-baloney answers. I am now switching back!

And let my tale of woe be a warning to everyone eagerly awaiting progress just because the version number is incrementing.

Sometimes progress isn't all it's cracked up to be!

Martin Béchard is currently annoyed with AI vendors that are passing off their bloated reasoning models as capable of 10x Coding. If you are struggling to get these overly-verbose AI Coding assistants to do their job, try going back, or for human-grade intelligence please reach out at [email protected]!

Me and my AI coding buddy

356 位关注者

Benjamin Lee

Fullstack Engineer | React, Ruby, Postgres | InterlinearHub

2 天前

When I hear about AI does get dumber sometimes, I feel a sense that our profession might not be doomed after all. Is that counterintuitive? ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Martin Bechard的更多文章

Reasoning AI Coding Bakeoff - Part 1 of 3

2025年3月8日

Reasoning AI Coding Bakeoff - Part 1 of 3

The other day I was asked "Hey good lookin', what's cookin'?", something I haven't heard as frequently as I used to…

2 条评论
Reasonings found in a bathtub

2025年2月5日

Reasonings found in a bathtub

Since the end of 2024, the latest evolution of Large Language Models is dominated by so-called Reasoning models, with…
ClaudePS: A Prompting Tool for Claude Sonnet

2024年12月19日

ClaudePS: A Prompting Tool for Claude Sonnet

If you are, like me, an extensive user of Claude Sonnet 3.5, you create multiple projects, each having dozens of…

1 条评论
Architecting a Queuing Solution With Claude Sonnet 3.5

2024年12月14日

Architecting a Queuing Solution With Claude Sonnet 3.5

The other day, I did some Yak shaving. I had a little problem which, upon reflection, turned into a big problem with…

2 条评论
Developing with Anthropic MCP (Part 1)

2024年12月2日

Developing with Anthropic MCP (Part 1)

Anthropic has just released the Model Context Protocol and a new version of Claude Desktop as a new way of integrating…
Cline - New (Old) Kid in Town

2024年11月19日

Cline - New (Old) Kid in Town

There's a new AI Codeslinger in town called Cline. Born ClaudeDev, Cline got a name change for marketing reasons.
Perplexity vs. OpenAI: Battle of the AI Search Titans

2024年11月1日

Perplexity vs. OpenAI: Battle of the AI Search Titans

Earlier today I saw that OpenAI posted on LinkedIn that it had released its much-vaunted "AI Search" which had been in…
Building Swarm-JS (Part 1)

2024年10月28日

Building Swarm-JS (Part 1)

Recently Anthropic released Swarm, an "Agentic" open-source framework in python. As the README says: An educational…
Putting the "New" Claude Sonnet 3.5 through its paces

2024年10月24日

Putting the "New" Claude Sonnet 3.5 through its paces

I was recently hitting the limitations on Claude Sonnet's output on a regular basis, as part of getting Claude to…

1 条评论
Perplexity: Secret Agent Man

2024年10月23日

Perplexity: Secret Agent Man

Perplexity, the leading AI search engine that is becoming the new Google for AI-savvy searchers, is getting on the…

See all articles

Me and my AI coding buddy

356 位关注者

Martin Bechard的更多文章

Reasoning AI Coding Bakeoff - Part 1 of 3

Reasonings found in a bathtub

ClaudePS: A Prompting Tool for Claude Sonnet

Architecting a Queuing Solution With Claude Sonnet 3.5

Developing with Anthropic MCP (Part 1)

Cline - New (Old) Kid in Town

Perplexity vs. OpenAI: Battle of the AI Search Titans

Building Swarm-JS (Part 1)

Putting the "New" Claude Sonnet 3.5 through its paces

Perplexity: Secret Agent Man