AI is getting dumber!
Today I was not always happy

AI is getting dumber!

A few friends of mine have been talking about how models are starting to fall apart with the new "Reasoning" training techniques. Vendors say it's because the new models are stronger in some areas, weaker in others. That doesn't make it alright for me.

Today I was trying to get Claude Sonnet 3.7 to fix the typescript errors it had created generating a test fixture:

First of all, it looks like it was just spitting out random code given the amount of different issues in this one file. And what's the point of AI if it's can't apply typescript rules better than humans? (And I look forward to AGI so that the problem of finding a missing closing parentheses can finally be solved...) Now I don't believe in Artificial Intelligence, just Apparent Intelligence. Nevertheless I don't relish spending time fixing typescript errors, especially as my "assistant" created them.

So I posed the question to 3.7, Anthropic's "Most intelligent model yet!"

I decided to ask 3.7, in its lofty intelligence level, to sort this out. Part of me suspected that this might just be a dummy file needing to be deleted, considering all the errors, so I ask for confirmation that this is indeed a real test.

This has got to be one of the laziest messages I have received from an AI. I can plainly see that the symbol doesn't appear to be on the type, on line 581. Then it just makes some vague guesses. Then esentially tells me good luck in fixing it.

I was peeved and sent the ball back in its court:


You got that right, bud! You break it, you fix it!

I configured an MCP server to let it access my project files as required so it goes ahead and reads files.

Unfortunately, the response is completely bogus, blaming the typescript errors on a glitch in the typescript server (used by VS code to convert typescript on the fly to javascript). While I've had a problem with the typescript server once or twice in the past, it's really a leap to jump immediately to that. And telling me to look into the tsconfig.json is a great way for me to waste hours trying different typescript options. All unecessarily of course.

I had heard that Claude 3.5 was actually better for coding, so I decided to switch models and see if the rumor was true with this simple test.


From the same prompt, it gets the problem RIGHT AWAY!!!

It continues:

So there we have it: a simple problem, but where 3.5 went straight ahead and solved the problem, the lethargic 3.7 gave me a bunch of phoney-baloney answers. I am now switching back!

And let my tale of woe be a warning to everyone eagerly awaiting progress just because the version number is incrementing.

Sometimes progress isn't all it's cracked up to be!


Martin Béchard is currently annoyed with AI vendors that are passing off their bloated reasoning models as capable of 10x Coding. If you are struggling to get these overly-verbose AI Coding assistants to do their job, try going back, or for human-grade intelligence please reach out at [email protected]!

Benjamin Lee

Fullstack Engineer | React, Ruby, Postgres | InterlinearHub

2 天前

When I hear about AI does get dumber sometimes, I feel a sense that our profession might not be doomed after all. Is that counterintuitive? ??

要查看或添加评论,请登录

Martin Bechard的更多文章

  • Reasoning AI Coding Bakeoff - Part 1 of 3

    Reasoning AI Coding Bakeoff - Part 1 of 3

    The other day I was asked "Hey good lookin', what's cookin'?", something I haven't heard as frequently as I used to…

    2 条评论
  • Reasonings found in a bathtub

    Reasonings found in a bathtub

    Since the end of 2024, the latest evolution of Large Language Models is dominated by so-called Reasoning models, with…

  • ClaudePS: A Prompting Tool for Claude Sonnet

    ClaudePS: A Prompting Tool for Claude Sonnet

    If you are, like me, an extensive user of Claude Sonnet 3.5, you create multiple projects, each having dozens of…

    1 条评论
  • Architecting a Queuing Solution With Claude Sonnet 3.5

    Architecting a Queuing Solution With Claude Sonnet 3.5

    The other day, I did some Yak shaving. I had a little problem which, upon reflection, turned into a big problem with…

    2 条评论
  • Developing with Anthropic MCP (Part 1)

    Developing with Anthropic MCP (Part 1)

    Anthropic has just released the Model Context Protocol and a new version of Claude Desktop as a new way of integrating…

  • Cline - New (Old) Kid in Town

    Cline - New (Old) Kid in Town

    There's a new AI Codeslinger in town called Cline. Born ClaudeDev, Cline got a name change for marketing reasons.

  • Perplexity vs. OpenAI: Battle of the AI Search Titans

    Perplexity vs. OpenAI: Battle of the AI Search Titans

    Earlier today I saw that OpenAI posted on LinkedIn that it had released its much-vaunted "AI Search" which had been in…

  • Building Swarm-JS (Part 1)

    Building Swarm-JS (Part 1)

    Recently Anthropic released Swarm, an "Agentic" open-source framework in python. As the README says: An educational…

  • Putting the "New" Claude Sonnet 3.5 through its paces

    Putting the "New" Claude Sonnet 3.5 through its paces

    I was recently hitting the limitations on Claude Sonnet's output on a regular basis, as part of getting Claude to…

    1 条评论
  • Perplexity: Secret Agent Man

    Perplexity: Secret Agent Man

    Perplexity, the leading AI search engine that is becoming the new Google for AI-savvy searchers, is getting on the…