When performing reasoning or generating code, do #LLMs really understand what they’re doing, or do they just memorize? Several new results seem to have painted a not-so-rosy picture. (On #Mastodon: https://lnkd.in/gfmpEUfD) References [1] Xiaojuan Tang, Zilong Zheng, Jiaqi Li, Fanxu Meng, Song-Chun Zhu, Yitao Liang, and Muhan Zhang. 2023. Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners. https://lnkd.in/gE4jwSgS [2] Antonio Valerio Miceli Barone, Fazl Barez, Ioannis Konstas, and Shay Cohen. 2023. The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python. https://lnkd.in/gpS6hiN5 [3] Emre Kiciman, Robert Osazuwa Ness, Amit Sharma, and Chenhao Tan. 2023. Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. https://lnkd.in/g_FDfdrG [4] https://lnkd.in/gCynycG6 [5] Zhijing Jin, Jiarui Liu, Zhiheng Lyu, Spencer Poff, Mrinmaya Sachan, Rada Mihalcea, Mona Diab, and Bernhard Sch?lkopf. 2023. Can Large Language Models Infer Causation from Correlation? https://lnkd.in/gHdAZf2M #Paper #NLP #NLProc #CodeGeneration #Causation #CausalReasoning #reasoning #research?
Do we really need them to understand the way human beings understand, if at all human beings do "really" understand? How does it matter in pragmatic sense?
This is an excellent list of papers debunking many false claims about LLM. You can add to the list this paper that debunks the so called planning capability of LLM https://www.dhirubhai.net/posts/pkghosh_on-the-planning-abilities-of-large-language-activity-7083501621594247168-Uk49?utm_source=share&utm_medium=member_ios
There's a lot of confusion. It is exceptionally hard to design experiments that answer questions about what a complex 'intelligent' system is capable of. It doesn't matter whether the system at issue is a human, a machine, a great ape, a pigeon or an octopus: we always run the risk of fooling ourselves. At minimum, it is always worth spending a while trying to think of ways that something could achieve the performance without being especially smart. Some work of this type is done by groups that include deep expertise in computational Cognitive Science. Papers that have Tenenbaum or Federenko or Levy from MIT (to name only three) are pretty certain to be in that category. They have put in the hard yards designing experiments to probe human abilities, so are likely to design good ones for machines. Two of the above papers have researchers who I know to have strong long-term records in computer science and AI. Either of those features decreases the chance that they are missing something, but it still can happen. so we have to stay critical. Personally, I always used to start from 'useful, but limited" as my best guess about what an automated system can do. Recently it's changed to "very useful, still limited"
Actual AI experts have understood that LLMs have neither reasoning nor understanding in any meaningful sense all along. The number of vocal people without a shred of understanding has mostly just exploded over the past year, skewing heuristic availability. A fundamental understanding of what any technology is and is not capable of is a thing that no one talking about that technology can afford to lose sight of. A few like the researchers at Stanford have made a good showing this year, debunking many of the fraudulent claims, like putting a stake through "emergent abilities": https://arxiv.org/abs/2304.15004
Thank you Benjamin, for sharing this insightful review. I really needed to read some scientific work on studying symbolic reasoning and LLMs. Generally when I hear someone talks about magical casual inference or reasoning in LLMs, my first question is do they have any structure that resembles conceptual graphs? If they don't I stereotypically label them as generalized statistical models?? . I'm still hoping to see some new effort to build some Large ConceptualGraph-based model, like a modern self supervised Freebase! I wish I had time and resources and I would have worked on a large Petri-net like model to learn symbolic concepts separate from the language and then map those to an LLM, and Boom the end of the world!
a problematic thing here though is that humans have the capability to perform well symbolically but in real world situations tend to fail a lot of the time, especially as more people are involved … point being that the results can’t necessarily imply a lack of capability .
"Understanding" is a function of sentience, which only humans have. "Pattern identification" however, is how machines draw inferences between apparent cause and effect - right or wrong, which is in turn riddled by pre-programmed human biases.
Benjamin Han I don't see a definition of "understanding" used for grounds of the argument. Also this looks like a false dichotomy since LLMs are no doubt memorizing, but may also understand in a primitive fashion (which demands eventual definition). It's interesting though. Personally I think it is a mistake to regard understanding as any kind of absolute -- it's more like a continuum (IMO!).
Thank you Benjamin Han for the references - good and timely work.
ML @ AppZen | CMU LTI Grad | Ex - Apple MLE NL Intern
1 年I can't help but wonder whether LLMS struggling with symbolic language similar to the case when historians/linguists first find it difficult to decipher ancient languages (with unfamiliar symbols). I found the experiment on switching the function names interesting. Hence, I believe the results are in line with the fact that LLMs are probabilistic models in the end. They predict the next probable word given a sentence of words whose meanings they understand, based on learned distributions from the training data. (I assume that's how humans would start deciphering ancient languages as well based on statistics)