登录查看更多内容

The Agents Newsletter #6: Reinforcement Learning Agents (The OGs)

Shanif Dhanani

Founder of Nobi —?AI shopping assistant that boosts conversion rates by improving discovery and recommendations

发布日期: 2025年2月17日

Today, we talk about the OG agents. The ones that us nerds first heard about close to a decade ago. The ones that existed before LLMs were even thing.

I am, of course, talking about reinforcement learning (RL) agents. These super-advanced AI agents that can beat you in games, trade your stocks, and operate robots, all without batting an eye.

But first, hello and welcome to this 6th edition of The Agents Newsletter. If you haven’t read the first few issues, I suggest you do so (both, because I love seeing meaningless vanity metrics go up, but also because the information in those early editions provide good background information that may be useful for today’s topic).

And if you’re new to this newsletter, I hope you’ll find it to be an informative, not-too-technical deep dive into a technology that I think will be changing our world in the next few years: autonomous agents. I publish it biweekly and am always looking for questions/comments/constructive criticism, so please reach out if there’s something you want to talk about!

And now, on to the good stuff.

(Re)introducing RL agents

When I use the word “agent” in this newsletter, 9 times out of 10, I’m referring to LLM-powered autonomous agents. But despite these being the flavor of the week, the concept of “agents” has been around for a while now.

I first heard the term about 8 years ago when I was working on a deep neural network designed to predict future stock performance from analyst opinions and quantitative data. But the concept goes back much farther than that.

The idea of self-governing machines, or “agents,” has been around since the very early days of AI. Ever since the first Checkers program was created in the 50s, computer scientists have been making incremental progress towards a world of true automation, but in 2016, things picked up dramatically.

About a decade ago, you might have heard that a new AI application from Google beat the world’s reigning champ in Go. Around the same time, Google also created AI that beat some of the world’s best Starcraft players.

At the time (and even today), these were huge accomplishments.

I’ve never played Go, but I do know Starcraft. It requires controlling dozens (if not hundreds) of different processes, taking actions from a large (but still finite) set of possible things to do, and implementing strategic and tactical decisions across many different variables, all while your opponents are trying to do the same.

In order to win a Starcraft game, you need to consistently outwit your opponents across a number of different variables. Before RL, I would have told you that you needed to have human-level intelligence to do so.

But Google figured out how create machines that could win, repeatedly, and dominantly.

It. Was. Amazing.

A visualization of AlphaStar, Google’s Starcraft agent, at work

And the AI that powered these agents was remarkable.

It wasn’t like how Deep Blue worked (IBM’s super computer that could win at chess). It didn’t simply memorize every possible position on the game board and use brute force to figure out which move should be played next. Heck, doing that might even be impossible with Starcraft, where things are never quite in the same position in every single game.

Instead, the AI learned how to continuously and repeatedly take actions that were most likely to lead to a winning outcome. In that respect, reinforcement learning (RL) agents are similar to LLM-based agents. They make decisions and take actions. But a lot of the similarities end there.

领英推荐

The Crucial Difference Between AI And AGI

Bernard Marr 9 个月前

AI Surge Unleashed: Your Weekly Testing News - Issue…

Ministry of Testing 1 年前

Dynamic Learning in AI: A Revolution in Real-Time…

David Cain 8 个月前

College graduates and PhDs

If an LLM-agent is a college grad, possessing generalized intelligence with the ability to reason its way through problems by using logic, then RL agents are like PhDs, highly-trained authorities that use advanced training to develop unparalleled expertise in one highly specific area.

LLM-based agents use textual reasoning to achieve an objective by iteratively selecting and executing tools in a process designed to make incremental progress towards said objective. But RL agents are pre-trained using feedback, and are highly optimized for a specific scenario.

During training time, they’re given a pre-defined set of actions that they can take in an environment that has some well-defined structure, and are trained to achieve their objectives based on rewards that they’re given when achieving something desirable.

They’re optimized using a mathematical framework that allows them to, very precisely, learn which actions are mathematically optimal for any given scenario, even in extreme uncertainty. This allows them to achieve a level of exact precision that only a machine can reach, which they do in a highly structured, almost constrained way.

Because RL agents use a mathematical framework for making decisions, they can be refined to a level of superhuman performance. This same mathematical approach allows them to make decisions with the same lightning-fast speed that we’ve come to expect from any software application. What’s more, the further they’re trained, the better they get (not to mention that they can even train by playing previous or alternative versions of themselves, allowing them to get a level of guidance that couldn’t even be matched by human-level experts).

But this same structure that makes them so precise is also what makes them so hard to generalize. Because they have so many well-defined variables that represent their environment and the actions they can take, they’re unable to easily “slot in” new information or new actions, and they lose that same generalizability that LLM-enabled agents have.

For example, if you want to add a new tool to an LLM-enabled agent, you can simply throw it into the list of existing tools that it has access to, and because it uses text-based reasoning to make a decision on what tool to use, it can logically figure out when and where to use that new tool. But if you want to expand an RL agent’s capability by adding a new type of action or a new way to measure the world, as far as I know, you need to retrain it from scratch, a costly and time-consuming process.

(Note: This is obviously where the college grad analogy breaks down. PhDs can obviously do the same things that college grads can do without having to re-learn everything they know from scratch, but let’s ignore that for now).

This is why I believe that, while RL agents can be extremely powerful in the right settings, they might not be powering our future day-to-day assistants any time soon (that said, I have a future issue that I’m creating which describes using large action models to power autonomous agents, and that could change everything, but we’ll get to that at a later date).

Enhancing LLM-powered agents with a mathematical framework

To conclude this issue, I want to zoom out a bit. In my last issue, I discussed why agents aren’t able to handle some common data science tasks as well as traditional machine learning, and in this issue, I spent most of my time going over why RL agents are so much better at handling specific tasks than are LLM-powered agents.

The common denominator in both of these scenarios is that LLM-powered agents currently use textual reasoning for their decision-making, which lets them generalize, but prevents them from achieving that sweet mathematical precision that we need for so many use cases.

You have to wonder if, one day, we’ll ever hybrid agents that can implement both logical reasoning (perhaps as text) for generalized decision-making, and feedback-loop enabled optimizations for any use case.

I can see it happening. From a data science standpoint, I’m not quite sure how it’ll happen just yet, but there’s definitely a world where we can create it.

Ok, that’s all for today. As mentioned earlier, please send over any comments, questions, or ideas that you’d like me to dive into further. And if someone forwarded you this issue and you’d like to get The Agent Newsletter delivered to you directly every other week, you can subscribe here.

Until next time.

-Shanif