Curb your LLMs : 'LLM-zoning' to overcome hallucination in quantitative analysis
I am not sure if it was Prem Kumar Aparanji or Gary Marcus who came up with the fitting analogy of LLMs as an overkill : Why use 'Swiss knives the size of Mt. Zermatt' to open a bottle, when you can use a simple bottle opener. Which 'opens' better, and cheaper, and is easier to manage?
If you ask ChatGPT what's 1+1, how does it do it?
It undergoes a complex process. Firstly, it doesn't 'compute' the answer in the traditional sense, it rather makes a probabilistic guess. It generates a response based on patterns learned from a vast dataset. This involves a series of steps including tokenization, context analysis, and probability estimations within its neural network architecture.
Conversely, If you tap 1+1 onto your calculator, how does it do it?
In a fundamentally different manner. It relies on binary arithmetic and logic gates to perform calculations. A calculator's processor chip converts input numbers into binary, processes them through these gates in circuits called adders, and then translates the binary result back to a decimal output. This process is not only more direct but also significantly more energy-efficient.
These two approaches lead to many 'emerging properties' for each mode of processing when applied to quantitative analysis, but how do the results differ when compounded over millions and billions of arithmetic operations, which is quite typical in many industries including the one where Batonics is most active in; institutional asset and investment management?
Two parameters stand out as critical:
1- Computational expenditure/Energy
2- Error probabilities
...when quantitative analysis is conducted by stochastic approaches (LLMs) vs a deterministic logic (calculator)
When evaluating the computational expense and error in quantitative tools, particularly LLMs, two key aspects emerge: energy consumption and error probability. Compounding these effects over say a million (10^6) calculations reveal the inherent jeopardy. For LLMs, an assumed two orders of magnitude greater energy expenditure than calculators can quickly propagate and lead to prohibitive costs, potentially rendering these tools impractical for certain quantitative analyses.
Then comes the error probabilities. LLMs, with their learning-based approach, inherently carry a probabilistic error margin, unlike traditional calculators (technically a calculator will never err unless put into very peculiar physical constraints). In complex, layered calculations typical in institutional investment analysis, even a minute error in LLM output can escalate dramatically. For instance, a 10^-3 error probability could cumulatively (and easily) lead to a 50% error rate, presenting plausible yet incorrect quantitative results.
Calculators maintain a significantly lower (for all practical purposed, negligible) error probability, This precision ensures consistency and reliability in outputs, even across numerous iterations, making them more suited for high-precision tasks.
Harmonizing LLMs and Calculators is not novel. It's been researched, tried and some flaws already exist. (See this article by Gary Marcus https://garymarcus.substack.com/p/getting-gpt-to-work-with-external )
At Batonics, we recognize these limitations and adopt a strategy of 'LLM Zoning'. We allocate specific roles to LLMs, like intent mapping and code generation, which are suited to their generalist and language based capabilities. For arithmetic and algorithmic processes, we deliberately 'zone out' LLMs. This approach ensures that complex calculations remain within the realm of traditional computing methods, providing accuracy and minimizing the risks of compounded errors.
The resultant LLM-free zones producing LLM-free results are thereby more trusted whilst they still employ LLMs for the upstream high-level operations. The LLM-free zones and the operations within (the engines) are explainable, not in natural language, as some versions of GPT are designed to, but rather through stepped logic; pure mathematical rigor that is not open to interpretation. Which is the only epistemic tool we think is fit for a tool meant to be ubiquitous for all future institutional capital allocation and asset management.
Automation & AI | Early Childhood Care & Education | Wildlife Conservation
1 年Thank you for remembering me & tagging me. Yes, the Mt Zermatt reference was mine indeed. ?? https://www.dhirubhai.net/posts/premka_how-microsoft-is-trying-to-lessen-its-addiction-activity-7112683841940545537-xLic