The Truth About LLMs and Numeric Accuracy
Stefan M?nsby
Senior managementkonsult - digital transformation | AI evangelist | I use Arch btw
In my recent deep-dive into using Large Language Models (LLMs) publicly available GPTs such as OpenAI, Gemini, and LLAMA for financial applications, one theme keeps popping up: not all LLMs (or rather none of them) are purpose-built for the nuances of financial data and complex calculations. While they’re incredibly powerful at generating insights and summarizing concepts, they can struggle with tasks requiring exact numeric accuracy or specialized regulatory knowledge.
Here’s what I’ve found, including two real-world examples of how things can go wrong—and how we can make the most of LLMs without sacrificing precision.
Two Times LLMs/GPTs Went Off the Rails
Case A: The Overly Confident ROI Calculation
A user asked an LLM to calculate a three-year project's Return on Investment (ROI) with multiple cash inflows and outflows. The model produced a detailed explanation of the steps (e.g., discount factors, net present value) that looked correct. However, upon closer inspection, the LLM incorrectly applied the discount rate to the second year’s cash flow twice, leading to an ROI calculation that was off by almost 10%. Because the explanation seemed plausible, the user didn’t immediately notice the error—only a manual recheck with a financial calculator revealed the mistake.
Case B: The Confused Q4 Earnings Forecast
In another scenario, an analyst asked an LLM to generate a Q4 earnings forecast based on historical quarterly results and external market indicators. The model correctly interpreted overall trends but “hallucinated” a few data points, pulling in figures from a different company with a similar name. The resulting report mixed factual and fabricated numbers, causing a skewed forecast that, if used in decision-making, would have left the company unprepared for the year-end results.
So, how does one get the best out of the publicly available LLMs if you are in finance you may ask? I've composed a list of five things to consider when venturing into the realm of using AI's to support your finance processes and decision making.
1 - Use Purpose-Built Tools for Calculations
LLMs excel at explanation, not arithmetic. When you need to crunch numbers, compounding interest, comparing corporate cash flows, or running detailed ROI calculations rely on specialized software like Excel, Python libraries, or financial calculators. Let the LLM clarify the formulas and logic, but keep the actual math in tools designed to handle it.
2 - Break Down Complex Tasks
One of the biggest pitfalls is asking LLMs to do multi-step calculations simultaneously. It may return an answer that seems plausible, but is actually off in one critical step. Instead, break these tasks into smaller, manageable chunks, verifying each step with a dedicated calculator or financial tool before moving on.
领英推荐
3 - Cross-Reference Multiple Sources
When accuracy matters, don’t stop at a single answer. LLMs can hallucinate or misinterpret data, leading to results that look authoritative but contain errors. Always compare the LLM’s output with a second method maybe a known formula, a trusted financial model, or even a different AI system to validate final figures. And yes, you can technically prompt most LLM's not to hallucinate, or not to guess things but that leads to a catch 22 scenario which is an article in its own.
4 - Provide Clear Context and Instructions
Ambiguity is the enemy of accuracy. The more precise you are with your prompts providing formulas, definitions, and any pertinent context the less guesswork the LLM will have to do. Clear instructions reduce the risk of “best guess” answers, ensuring the model stays aligned with your goals.
5 - Limit Reliance on “Pure” LLM/GPT Output for Critical Decisions
Financial decisions often have real-world consequences, be it compliance, investments, or strategic planning. Human reviews and specialized software checks are vital when the stakes are high. Let the LLM handle the initial data exploration or summarization, but always use rigorous validation before acting on its findings.
To sum things up
LLMs are potent allies for high-level financial analysis, summarization, and ideation. However, their strength lies in understanding language patterns rather than flawlessly crunching numbers. If you pair LLMs with dedicated financial tools and meticulous review processes, you’ll unlock the best of both worlds: the agility and flexibility of AI-driven insights plus the reliability and precision your financial tasks demand.
And a personal comment, as a guide, I use generative AI tools to save time and not to solve tasks that I can't understand.
Feel free to check out my open-source project, fcopa over at Github, where I’m experimenting with LLMs for financial reasoning and share your own insights or challenges.
Let’s keep pushing the boundaries of AI in finance, responsibly!
/Stefan M?nsby
Elastx - The Swedish Cloud Provider. All Data stannar i Sverige. En h?gkvalitativ och mycket s?ker svensk molnplattform sedan 2012. K8S. AI. DBaaS. GDPR Compliant. CloudAct Free. Suver?n Support. Digital Suver?nitet.
2 个月Bra artikel med nyttiga insikter och en hel del sunt f?rnuft. AI kanske ska hanteras som en "mycket kompetent PRAO elev" - s? d? f?r man inte gl?mma bort att man fortfarande har ansvaret!