登录查看更多内容

The Truth About LLMs and Numeric Accuracy

Stefan M?nsby

Senior managementkonsult - digital transformation | AI evangelist | I use Arch btw

发布日期: 2025年1月8日

In my recent deep-dive into using Large Language Models (LLMs) publicly available GPTs such as OpenAI, Gemini, and LLAMA for financial applications, one theme keeps popping up: not all LLMs (or rather none of them) are purpose-built for the nuances of financial data and complex calculations. While they’re incredibly powerful at generating insights and summarizing concepts, they can struggle with tasks requiring exact numeric accuracy or specialized regulatory knowledge.

Here’s what I’ve found, including two real-world examples of how things can go wrong—and how we can make the most of LLMs without sacrificing precision.

Two Times LLMs/GPTs Went Off the Rails

Case A: The Overly Confident ROI Calculation

A user asked an LLM to calculate a three-year project's Return on Investment (ROI) with multiple cash inflows and outflows. The model produced a detailed explanation of the steps (e.g., discount factors, net present value) that looked correct. However, upon closer inspection, the LLM incorrectly applied the discount rate to the second year’s cash flow twice, leading to an ROI calculation that was off by almost 10%. Because the explanation seemed plausible, the user didn’t immediately notice the error—only a manual recheck with a financial calculator revealed the mistake.

Case B: The Confused Q4 Earnings Forecast

In another scenario, an analyst asked an LLM to generate a Q4 earnings forecast based on historical quarterly results and external market indicators. The model correctly interpreted overall trends but “hallucinated” a few data points, pulling in figures from a different company with a similar name. The resulting report mixed factual and fabricated numbers, causing a skewed forecast that, if used in decision-making, would have left the company unprepared for the year-end results.

So, how does one get the best out of the publicly available LLMs if you are in finance you may ask? I've composed a list of five things to consider when venturing into the realm of using AI's to support your finance processes and decision making.

1 - Use Purpose-Built Tools for Calculations

LLMs excel at explanation, not arithmetic. When you need to crunch numbers, compounding interest, comparing corporate cash flows, or running detailed ROI calculations rely on specialized software like Excel, Python libraries, or financial calculators. Let the LLM clarify the formulas and logic, but keep the actual math in tools designed to handle it.

2 - Break Down Complex Tasks

One of the biggest pitfalls is asking LLMs to do multi-step calculations simultaneously. It may return an answer that seems plausible, but is actually off in one critical step. Instead, break these tasks into smaller, manageable chunks, verifying each step with a dedicated calculator or financial tool before moving on.

领英推荐

Algorithms, Simplified!

Rajesh Dangi 3 年前

Exploring the LLM Infra Stack, Part 2: The Model Layer

Theory Ventures 1 年前

RAG in Finance: Cutting-Edge or Yesterday’s News?

Keye (YC F24) 7 个月前

3 - Cross-Reference Multiple Sources

When accuracy matters, don’t stop at a single answer. LLMs can hallucinate or misinterpret data, leading to results that look authoritative but contain errors. Always compare the LLM’s output with a second method maybe a known formula, a trusted financial model, or even a different AI system to validate final figures. And yes, you can technically prompt most LLM's not to hallucinate, or not to guess things but that leads to a catch 22 scenario which is an article in its own.

4 - Provide Clear Context and Instructions

Ambiguity is the enemy of accuracy. The more precise you are with your prompts providing formulas, definitions, and any pertinent context the less guesswork the LLM will have to do. Clear instructions reduce the risk of “best guess” answers, ensuring the model stays aligned with your goals.

5 - Limit Reliance on “Pure” LLM/GPT Output for Critical Decisions

Financial decisions often have real-world consequences, be it compliance, investments, or strategic planning. Human reviews and specialized software checks are vital when the stakes are high. Let the LLM handle the initial data exploration or summarization, but always use rigorous validation before acting on its findings.

To sum things up

LLMs are potent allies for high-level financial analysis, summarization, and ideation. However, their strength lies in understanding language patterns rather than flawlessly crunching numbers. If you pair LLMs with dedicated financial tools and meticulous review processes, you’ll unlock the best of both worlds: the agility and flexibility of AI-driven insights plus the reliability and precision your financial tasks demand.

And a personal comment, as a guide, I use generative AI tools to save time and not to solve tasks that I can't understand.

Feel free to check out my open-source project, fcopa over at Github, where I’m experimenting with LLMs for financial reasoning and share your own insights or challenges.

Let’s keep pushing the boundaries of AI in finance, responsibly!

/Stefan M?nsby

Jens Brintler

Elastx - The Swedish Cloud Provider. All Data stannar i Sverige. En h?gkvalitativ och mycket s?ker svensk molnplattform sedan 2012. K8S. AI. DBaaS. GDPR Compliant. CloudAct Free. Suver?n Support. Digital Suver?nitet.

2 个月

Bra artikel med nyttiga insikter och en hel del sunt f?rnuft. AI kanske ska hanteras som en "mycket kompetent PRAO elev" - s? d? f?r man inte gl?mma bort att man fortfarande har ansvaret!

2 次回应

查看更多评论

要查看或添加评论，请登录

Stefan M?nsby的更多文章

FCOPA — the Financial Large Language Model (LLM) Evaluator

2024年12月18日

FCOPA — the Financial Large Language Model (LLM) Evaluator

I'm excited to introduce FCOPA — the Financial Large Language Model (LLM) Evaluator and Benchmarking Tool project. This…

9 条评论
?r du verkligen en bra kollega?

2021年11月1日

?r du verkligen en bra kollega?

N?r st?llde du dig sj?lv senast fr?gan: "exakt vad kan jag g?ra, f?r att g?ra livet enklare f?r mina kollegor"? I v?r…

3 条评论

The Truth About LLMs and Numeric Accuracy

Stefan M?nsby

Senior managementkonsult - digital transformation | AI evangelist | I use Arch btw

Two Times LLMs/GPTs Went Off the Rails

1 - Use Purpose-Built Tools for Calculations

2 - Break Down Complex Tasks

领英推荐

3 - Cross-Reference Multiple Sources

4 - Provide Clear Context and Instructions

5 - Limit Reliance on “Pure” LLM/GPT Output for Critical Decisions

To sum things up

Stefan M?nsby的更多文章

社区洞察

其他会员也浏览了

Predicting Credit Risk Using Machine Learning

Forecasting Potential Long Term Bond Performance with Machine Learning in a Declining Interest Rate Environment

ML IN STOCK MARKET TRADING

How machine learning is changing the financial industry

How to fine-tuning a LLaMa-2 overnight?

Financial Market Predictions with Solvent GPT: A Game Changer in Stock Trading

Diffbot's Approach to Knowledge Graph

From Traditional to Transformative: Machine Learning in Credit Scoring

LLMs and Financial Data - Introduction

Two Times LLMs/GPTs Went Off the Rails

1 - Use Purpose-Built Tools for Calculations

2 - Break Down Complex Tasks

领英推荐

3 - Cross-Reference Multiple Sources

4 - Provide Clear Context and Instructions

5 - Limit Reliance on “Pure” LLM/GPT Output for Critical Decisions

To sum things up

Stefan M?nsby的更多文章

FCOPA — the Financial Large Language Model (LLM) Evaluator

?r du verkligen en bra kollega?

社区洞察

其他会员也浏览了

Predicting Credit Risk Using Machine Learning

Forecasting Potential Long Term Bond Performance with Machine Learning in a Declining Interest Rate Environment

ML IN STOCK MARKET TRADING

How machine learning is changing the financial industry

How to fine-tuning a LLaMa-2 overnight?

Financial Market Predictions with Solvent GPT: A Game Changer in Stock Trading

Diffbot's Approach to Knowledge Graph

From Traditional to Transformative: Machine Learning in Credit Scoring

LLMs and Financial Data - Introduction