登录查看更多内容

A.I. nears human-level forecasting quality

Lumina - makers of Analytica

Our software will help you map out your problems, discover what matters the most, and make better decisions faster.

发布日期: 2024年3月14日

Probabilistic forecasts are integral to decision-making processes, utilizing either statistical tools or human judgment to anticipate future events. While statistical forecasts rely on historical data under the assumption of its representativeness, judgmental forecasts leverage human expertise, domain knowledge, and intuitive interpretations. This Analytica blog post delves into the intricacies of judgmental forecasting through a challenging real-world example.

Several websites, including Metaculus , Good Judgment Inc , Infer , Polymarket , and Manifold, encourage individuals to provide probability estimates for geopolitical and technological events. These events eventually resolve (either occur or not by a specified date), which makes it possible to measure the quality of the earlier forecasts. These sites and related academic experiments have found that a small percentage of human forecasters consistently outperform other people by a substantial margin which has given rise to the title of "superforecaster". In addition, as one would expect, aggregations of all submitted forecasts consistently perform better than individuals.?

LLMs are able to produce probabilistic estimates of future events when prompted appropriately. Smaller LLMs like GPT-3.5, Gemini Pro, Claude 2.1, Mistral-8x7B, Llama-2-13B perform horribly on these assessment tasks. However, the largest LLMs like GPT-4-Turbo demonstrate an impressive ability to make assessments, yet fall far short of human-level quality.

领英推荐

AI Innovations: Unveiling the Latest Breakthroughs

Bayes Labs 2 个月前

How about the deeepSeek conundrum?

Courtnay Nery Guimar?es 2 周前

AI Innovations: Unveiling the Latest Breakthroughs

Bayes Labs 8 个月前

Researchers from 美国加州大学伯克利分校 created a system that makes subjective probabilistic assessments of binary-valued geopolitical questions, described in their recent paper “Approaching human-level forecasting with language models”. The system processes an assessment question in multiple stages, using LLMs heavily as subroutines in each of the stages. First, it gathers relevant news articles from news feeds, then reasons about and weighs them over several passes of LLM prompting, and finally aggregates all the information into a single probability estimate. Across all test questions, their system performs close to but slightly inferior to crowd-aggregated human scores. In cases where the crowd estimates are highly uncertain (between 0.3 and 0.7, which accounts for >50% of all questions), it does better than human crowd-sourced estimates, but performs worse when crowd estimates are very certain. Despite this form of under confidence, its estimates are very well calibrated.

This work provides yet another hint at how we can expect A.I. advancements to transform? model-based decision making and the field of decision analysis.

A.I. nears human-level forecasting quality

Lumina - makers of Analytica

Our software will help you map out your problems, discover what matters the most, and make better decisions faster.

领英推荐

Lumina - makers of Analytica的更多文章

社区洞察

其他会员也浏览了

LLMs: 4 Perspectives for Investors

From Chain of Thought to Network of Thought

LLM Papers Reading Notes - November 2024

How AI is Transforming Stock Predictions: A Look into Wealth Wingman's Advanced Technology

?? The Future of Designing AI Agents

Five critical thoughts and a warning on “Situational Awareness: The Decade Ahead.”

KEY SCIENTIFIC PAPER REDUX: CAN AI READ THE MINDS OF CORP EXECS?

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

Integrating AI into Policy Tracking Workflows: A Data-Driven Approach

EWE: A New Paradigm for Accurate Content Generation

领英推荐

Lumina - makers of Analytica的更多文章

Do LLMs “understand” and “reason”?

LLMs with search

?? Exciting News: Generative models are revolutionizing data creation in machine learning projects.

Beyond RAG in the Analytica AI Assistant

Why and how often hallucination occurs in LLMs

AI is rapidly transforming science

Q* Breakthrough Sparks Speculation

DeepMind's new AI system is the world's most accurate 10-day weather forecaster

Can GPT really play chess?

Time to conduct 100 iterations, in about 60 seconds.

社区洞察

其他会员也浏览了

LLMs: 4 Perspectives for Investors

From Chain of Thought to Network of Thought

LLM Papers Reading Notes - November 2024

How AI is Transforming Stock Predictions: A Look into Wealth Wingman's Advanced Technology

?? The Future of Designing AI Agents

Five critical thoughts and a warning on “Situational Awareness: The Decade Ahead.”

KEY SCIENTIFIC PAPER REDUX: CAN AI READ THE MINDS OF CORP EXECS?

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

Integrating AI into Policy Tracking Workflows: A Data-Driven Approach

EWE: A New Paradigm for Accurate Content Generation