AI Recap - February 2025

Another month passed, another spin of the AI wheel.

Here's a short recap of what happened that we feel you should know about.

Anthropic’s Unbreakable Model

Alright, so here’s the situation. We all know that AI models can be tricked into doing stuff they’re not supposed to, like giving harmful instructions or bypassing safety filters. That’s why companies like OpenAI or Anthropic spend months training their models to recognize and reject harmful prompts.

But… that process is expensive, slows down product launches, and honestly? It’s still not super reliable.

Hackers (this is all of us tbh) are constantly finding ways around these defenses.

What’s the Jailbreak Problem?

Basically, current AI models are way too easy to jailbreak. People have figured out tons of creative tricks to get around safety measures, like:

Switching to less commonly aligned languages
Obfuscating words (changing "bomb" to "firework device")
Role-playing scenarios (pretending the model is a professor/dad/scientist teaching a student/son/apprentice - user)
Asking for harmful info step by step
Encoding prompts (turning them into Base64 or other formats)

And the scary part? This happens all the time. It’s actually kind of embarrassing how fragile these systems are.

Anthropic enters the stage

This is where Anthropic comes in with a pretty bold claim: They’ve built a model that’s unhackable. The secret sauce? A system they’re calling the Two-Classifier Method.

Here’s the simple version of how it works:

Instead of just training the main model to refuse bad requests, Anthropic adds two extra models (classifiers) to watch over everything.
One classifier checks the user’s input before the main model even responds.
The second classifier watches the model’s response in real-time and can cut it off if it starts going off the rails.

This all works based on something called a constitution, which is basically a giant set of examples that teach the classifiers what’s good, what’s bad, and where the line is.

The cool part is that you can update the classifiers quickly if new risks appear without retraining the entire model, which is a huge time and money saver.

There are a few big reasons this system makes sense:

It’s faster to update — Just tweak the classifiers instead of retraining the whole model.

It splits the job — The main model focuses on being helpful, and the classifiers handle safety. This makes both jobs easier.

It can cut off harmful responses halfway through — Normal models just commit to a response once they start, but this setup can yank the cord at any point.

It’s really hard to reverse engineer — Hackers can’t easily poke around and figure out how the classifiers work, because they’re separate from the main model and not exposed to the user.

Does it work?

Anthropic says they put this system through 3,000 hours of red-teaming (basically hiring experts to try and break it), and not a single person could pull off a universal jailbreak. All that with only a slight (0.37%) increase in refusal and a 23% increase in processing time, which is manageable.

For context, hackers usually jailbreak new models within a day. Gemini 2.0 Pro got jailbroken in two days. Anthropic’s model? Still standing strong.

Open source controversy

Here’s the twist: This kind of system works great for closed models, but it’s a nightmare for open source. Why? Because if you release these classifiers as open-source code, people could just remove or edit them. So, if this becomes the gold standard for safety, it could seriously hurt the whole open-source ecosystem.

Some people even think regulators might force this kind of system onto every AI product—including open-source models—and maybe even bake it directly into operating systems like Windows or macOS. It sounds extreme now, but with AI getting more powerful, it’s not that far-fetched.

Oh, and don't forget. these big labs love to talk about "AI safety" as the reason they stay closed-source. But let’s be real: It’s not just about safety. It’s about protecting their competitive edge and making sure they get their return on the billions they’ve spent.

Anthropic says they care so much about safety but refuse to open-source the datasets and tools they used to build these super-safe models.

Feels pretty hypocritical.

Google's Gemini 2.0 - who cares?

So, Google just dropped Gemini 2.0 Pro along with a few other updates to their AI lineup. Here’s the quick rundown:

Gemini 2.0 Flash is now generally available in Google AI Studio and Vertex AI. It’s designed for high-volume, high-frequency tasks. A go-to for developers needing speed and scale.

Gemini 2.0 Pro is the star of the show. It's designed to handle complex prompts, big coding tasks, and has a mind-blowing 2 million token context window (we’re talking roughly 1.5 million words in a single prompt).

On top of that, they launched Gemini 2.0 Flash-Lite, which is their most cost-effective model ever.

Gemini 2.0 Pro is already sitting in the top two on Chatbot Arena (one of the go-to LLM benchmarks). The only thing beating it is Google’s own Gemini 2.0 Exp Flash Thinking, which is their souped-up reasoning version.

And

Despite these crazy-good numbers, nobody’s talking about it. Google’s models are outperforming most competitors and offering the best balance between performance and price (yes, even better than DeepSeek, which was supposed to be the budget king).

But the hype just isn’t there.

They have:

- The cash

- One of the best AI labs on the planet

- More compute than anyone else

- Insane distribution power

- A goldmine of training data from YouTube

And yet, developers avoid Google like the plague because their tools (especially Vertex AI) are a nightmare to use. If you’ve ever tried to work with Google Cloud, you know exactly what this means. It’s confusing, clunky, and way too complex compared to something like OpenAI’s dead-simple API.

Google has everything they need to dominate AI, but until they fix how painful it is to actually build with their tools, they’ll keep losing ground to companies with way less firepower.

What’s next for OpenAI?

So, Sam Altman went viral (again) with a tweet laying out OpenAI’s next moves, and honestly? There’s a lot to unpack here, so here’s what’s actually happening.

First Up: GPT-4.5 (aka Orion)

Sam confirmed that the next model release is something called GPT-4.5, previously known by insiders as Orion. This is basically the next evolution of GPT-4, but it’s not a reasoning model.

What does that mean?

GPT-4.5 is your classic pre-trained knowledge model - compressing the entire internet into weights and spitting out answers instantly.

It’s designed to be fast and intuitive, not slow and reflective.

The rollouts began today for the Plus subscription tier, so you might wanna check your OpenAI account now.

After That? GPT-5 — The All-in-One System

This is where it gets interesting. After GPT-4.5, OpenAI is going for GPT-5, which isn’t just a single model — it’s a system of models.

There will no longer be a model dropdown in ChatGPT.

Instead, GPT-5 will dynamically switch between models based on your request.

Ask for a quick fact? It uses a fast model.

Ask for complex reasoning or problem-solving? It switches to a slower, more thoughtful model.

This blended approach (sometimes called System 1 vs System 2 thinking we've covered a couple of times) is OpenAI’s big bet: make ChatGPT the only interface you need, and it automatically figures out which tools and models are best for each task.

This is OpenAI responding directly to competition — especially from DeepSeek, which has been shaking things up with its crazy-fast product launches.

To stay ahead, OpenAI is now:

Rolling out o3-mini in the Plus plan (that’s $20/month).
Adding file support to o1 and o3-mini.
Shipping faster than ever — directly because open-source competition forces them to.

OpenAI’s vision is clear: to become the backend of intelligence. The default “ask anything” platform, where you don’t worry about which model to use because it figures that out for you.

This is why open-source competition matters so much. When DeepSeek showed up and started shipping fast and free, OpenAI was forced to respond, and we all benefit.

Even Google is on board with this all-in-one system vision, which tells you this is probably where the whole space is heading.

o3 wins gold, but the real story is how

So, OpenAI’s latest paper shows that o3 (the new reasoning model) just smashed it in competitive programming benchmarks — officially hitting gold medal status. That’s impressive, but what’s way more interesting is how it got there.

See, o1 got gold using a bunch of hand-crafted tricks like careful filtering, teaching specific reasoning skills, and applying tons of human-designed hacks.

o3 just brute-forced its way there with pure compute and longer reinforcement learning. It's basically letting the model figure it out for itself instead of engineers micromanaging every step.

If that sounds familiar, it’s because this is exactly what DeepSeek did with R1.

No fancy tricks.

Just more compute, more trial and error, and letting the model self-discover what works.

This all ties back to Rich Sutton’s famous ‘Bitter Lesson’, which, in short, says:

In AI, scale beats cleverness.

The more compute you throw at the problem, the less you need human-designed rules and clever little hacks — the model just learns what works. DeepSeek proved it, and now OpenAI’s paper admits they knew this too — they just didn’t say until they had to.

Next time OpenAI drops prices or suddenly reveals a “new” technique, just remember: it’s not altruism — it’s competition.

When open-source thrives, we all win.

For more ML and AI insights, subscribe or follow Sparkbit on LinkedIn.

If you're looking to start an AI project, you can book a free consultation here: https://calendly.com/kornelk/ai-consultation-intro

Author: Kornel Kania , AI Delivery Consultant at Sparkbit

AI Recap - February 2025

Sparkbit

Large-scale production experience merged with PhD-level analytical grounding to deliver measurably-high business impact.

Anthropic’s Unbreakable Model

What’s the Jailbreak Problem?

Anthropic enters the stage

Does it work?

Open source controversy

Google's Gemini 2.0 - who cares?

What’s next for OpenAI?

o3 wins gold, but the real story is how

Smart Moves

459 位关注者

Sparkbit的更多文章

Anthropic’s Unbreakable Model

What’s the Jailbreak Problem?

Anthropic enters the stage

Does it work?

Open source controversy

Google's Gemini 2.0 - who cares?

What’s next for OpenAI?

o3 wins gold, but the real story is how

Smart Moves

459 位关注者

Sparkbit的更多文章

Buy smart or build strong? The hard truths about RAG

So you want to start AI transformation in your company? Read this first ->

Large Reasoning Models (II) OpenAI o1 and DeepSeek debunking

Large Reasoning Models (I) - a technical overview

AI Recap - December 2024

AI in 2024 - predictions vs reality

RAGOps - a blueprint for production-grade AI systems (pt. I)

AI - Transformation Engine or Problem Solver

AI Recap - November 2024

The "Retrieval" is what makes the RAG system production-ready. Here's how to work it out.