LLM Math?
At the beginning of the year, I predicted that reasoning would become a hot topic, but I didn’t expect it to blow up so quickly.
Just like my 2023 prediction about MoE, by the end of 2024, reasoning is almost certainly going to be the AI buzzword of the year. I might as well be Diavolo’s King Crimson.
But to be honest, there hasn’t been much worth writing about lately, and I’ve received some complaints from my friends.
So today, I decided to cover two papers at once.
Not just to fill up space, but because these two papers are actually related!
First Paper
What does this paper discuss?
I’ll start with a conclusion: LLMs don’t actually solve math problems; they just rely on brute-force exposure to massive amounts of problems.
Let me explain how this paper validates that claim.
It modifies and intervenes in original problems — not just randomly, but in a way that makes the new problems appear similar while requiring completely different solutions.
These interventions are divided into simple and hard categories.
You might think changing x+1x+1x+1 to x+2x+2x+2 or just xxx isn’t a big deal.
But it’s actually a fundamental shift.
Take the first problem as an example. Changing x+1x+1x+1 to x+2x+2x+2 still allows the problem to be solved using factoring.
But if you change it to xxx, can you still factor it?
No — you have to use the Cauchy-Schwarz Inequality instead.
For an LLM that has been pretrained on large datasets:
Not at all! Instead, it stubbornly sticks to factoring and tries to force a Chain-of-Thought (CoT) reasoning process that resembles the original approach.
And of course, it gets the problem wrong.
Summary
Let’s look at another example:
Left: Original Problem
Right: Hard Perturbation (MATH-P-Hard)
So, sometimes I disagree with the idea that “compression equals intelligence” — of course, I’m not pointing fingers at anyone here.
Conclusion: LLMs solve math problems using probabilities, not true understanding.
Second Paper
Now, someone might ask: “If LLMs rely on probability, why do models like OpenAI’s O-series and R1 show significant improvements in math reasoning?”
The answer: They’ve learned patterns.
Or rather, they’ve learned different CoT patterns.
With better training methods, models can acquire more diverse reasoning frameworks.
领英推荐
Why does this work?
Let’s look at the next paper.
What This Paper Does
The core idea is simple:
It clusters different CoT reasoning approaches into around 500 distinct “patterns”.
Then, these CoT patterns are explicitly trained into the model.
Now, the model knows which CoT pattern to apply for each type of problem.
That’s all there is to it.
But this is actually a genius idea.
What does this mean?
It turns continuous action spaces into discrete ones.
Once you make reasoning a discrete, finite space, training becomes much easier.
As for whether the paper uses MCTS, BON, or its own training method — it doesn’t really matter.
Inference Process in the New Model
That’s it.
Does This Work?
The paper includes ablation studies, showing improvement across all model sizes.
Although the tests only covered math, because they didn’t train the model on other domains,
I believe that embedding these reasoning patterns should generalize to other scenarios as well.
Math problems already have a lot of structure, and 500 reasoning patterns is actually quite a lot.
Where Did These 500 CoT Patterns Come From?
It doesn’t matter.
What matters is that they exist and they work.
Final Thoughts
These are two of the most interesting papers I’ve read recently.
They provide strong insights into future training methodologies.
And personally, I only value papers that convince me through fundamental principles.
Another perspective?
These papers actually made me feel more optimistic.
I’ve always worried that LLMs might eventually replace humans.
But I never had solid evidence to confirm or disprove that fear.
These two papers gave me some confidence.
Can LLMs replace humans?
At least not in their current form.
Can 500 CoT pattern templates really solve all reasoning problems?
What a joke.
Only when LLMs can autonomously generate reasoning and planning pathways will they be ready to compete with humans.
Until then — I’ve got a few more years to play around.