LLMs and Theory of mind
In March when researchers in Stanford published the paper "Theory of Mind Might Have Spontaneously Emerged in Large Language Models" (https://arxiv.org/ftp/arxiv/papers/2302/2302.02083.pdf), it kicked off a intense debate on AGI and how close we are to it. Even as late as November co-founder of Google's DeepMind Shane Legg, predicts that there is a 50% likelihood of reaching Artificial General Intelligence (AGI) within the next five years.(https://www.marketsandmarkets.com/industry-news/Google-AI-Chief-Predicts-50-Percent-Chance-Of-Achieving-AGI-In-5-Years).
Now a recent paper seems to throw cold water on such capability. Researchers from AI2, CMU and Seoul National University published a new benchmark to test the LLM capability on Theory of Mind and published the findings in their paper "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions". (https://arxiv.org/pdf/2310.15421.pdf)
A sample question is given in figure below
The results are not so promising (see the results for all questions below)
If the human capability is 87.5, the best models can only reach 12.3 and there is a LONG way to go.