Reasoning About Reasoning - III
1. The Setup
We’re back, and we’re now going to talk about reasoning evaluations in scenarios involving black-box language models.
A quick refresher, a model is black-box if its internal details (i.e. activations, weights, etc.) are unknown to you.
Black-box contexts are the backbone of modern AI-based applications.
And down the line, these will be the bedrock that agentic AIs rest on. As a consequence, we’re forced to deal with the matter of reasoning indirectly at the application level.
2. Tennis For Two
Unlike the framework discussed in the previous issue, we’re going to have to take a radically different approach. We need a language model to evaluate our language model’s reasoning!
You might recall from the last issue that we strongly discouraged the use of a language model for generating synthetic reasoning data — that’s because we had access to activations and could rely on those.
We have no such luck here. We’re forced to get another language model to help us out.
Here’s a diagrammatic overview of how things’ll work out:
It’s not all too different in structure from what we discussed in the last issue. The differences are the inability to view activations, and making use of an evaluator agent
领英推荐
3. Enforcers
The evaluator agent will be responsible for assessing reasoning, and the key assumption we’re making is that this model too is a blackbox.
Otherwise, unless you happen to have a high-end model yourself, it’d be unwise to have a low-resource language model assess an external one.
Our modus operandi will be carefully constructed system prompts being fed to the evaluator. Here’s the essential steps involved:
In addition, we can do something nifty and setup a feedback loop — this would let your evaluator repeatedly rerequest the language model for an answer until it’d be satisfied.
Of course, we can’t do much without having a purpose in mind. Why do we care about reasoning at all?
Let’s suppose we had an LLM write up blog articles for us (spoiler alert — ours isn’t), we’d care about not just the correctness of the article but also if it’d click with our readerbase.
To that end, we might want to see a couple jokes and general tonality in line with our blog’s general vibe.
To explain all of that to an evaluator model, we’ll need to prompt it with a system prompt. In essence, a system prompt is akin to roleplaying like you would on Runescape.
You write up a long, well-structured prompt that reads out like a character backstory and hand it over to your evaluator.
The key thing is to be highly specific about every single facet involved in the process — you don’t want your evaluator being forced to guess when faced with the unknown.
There’s more to this system prompt than purpose — we’ll need to talk about the evaluation criteria that’ll be outlined.
Interested in reading the next 5 sections? Please visit the beehiiv version.
This is a crucial discussion! Understanding AI reasoning is essential, especially as we delegate more responsibility to autonomous systems. What do you think are the most critical safeguards we should implement to guide these agents effectively? Looking forward to your insights!