Why Can’t we Simply Accept that LLMs are Already “Smarter” than us in Key Ways
DALL-E getting scary with a dash of truth.

Why Can’t we Simply Accept that LLMs are Already “Smarter” than us in Key Ways


So, there are no agreed standards for Artificial General Intelligence. I confirmed this by scouring the web for an objective standard as well as verifying that none exists with ChatGPT4. All we can do for now is watch academics battle for control of the AGI Wikipedia page.? My simple question is: Does it matter?

My premise for this piece is we should just embrace what LLMs and Gen AI are good at and get on with it. I will explore where the current academic conversation is as well as provide my own mini tests, research, and anecdotal evidence that demonstrates that LLMs may already be better than us in 5 different ways:

·????? Telling the Truth

·????? Synthesizing Information

·????? Unlocking Important Wisdom from History

·????? Providing Unbiased Comparisons and,

·????? Documenting objective history

At the core, an LLM is just statistics and some attempts at failsafe wrappers to try ensure they don’t go all Tay.ai on us. Having said this, Meta’s AI Chief Scientist, Yann LeCun probably goes off the deep end when he says that “LLMs aren’t even as smart as dogs.” (Jones, 2023) ?But LLMs can’t remember who you are or what you said beyond your last prompt. They don’t know your favorite ice cream or your favorite dog breed. If fact, they don’t, in the classic sense, know anything as Professor Chemero of the University of Cincinnati recently noted. (Koenig, 2023) ?Chemero’s recent article suggests AI and specifically LLMs differ very significantly from us because they are not “embodied,” as he puts it. (Chemero, 2023) ?By embodied he means that they don’t have to fight to survive or exist in our world and they don’t have the same concerns. The reality is that they have no concerns at all in the traditional sense other than the probability that the next word is the best option to follow the previous word given your prompt.


Sourced from OpenAI via Wired magazine. (Nield, 2023)

These seemingly alternative life forms (LLMs) that we are talking about can’t be trusted according to Professor Brent Mittelstadt (Mittelstadt et al., 2023) because they could hallucinate and thus, we really need to use them as only zero-shot translators, which for us mere mortals means to leverage them to convert factual concepts that they are fed as say bullets into paragraphs, essentially enslaving them as secretaries. He doesn’t seem to acknowledge the very high rates of human lying and error that exceed LLMs, but we’ll cover that later. They have certainly made some challenging mistakes exampled by this collection by Professor Ali Borji. (Borji, 2023)

Professor Michael Frank suggests that, nonetheless, the way that LLMs form associations suggests that they are likely a good way to understand human cognition (Frank, 2023), but he argues that the closed nature of current LLMs help us from uncovering those patterns fully and reduces our ability to accelerate our understanding of cognition. He seems to acknowledge between the lines that some of us are simply stochastic parrots, or organic probability machines not unlike ChatGPT, and that all of us are likely stochastic parrots, at least part of the time. This is where humans start to get squeamish as we may be closer to ChatGPT than we are comfortable acknowledging.

Professor Andrew Gelman (Gelman, 2023) pushes further in a fascinating recent conversation with Computational linguist, Professor Bob Carpenter who make the case that LLMs are intelligent. He essentially makes the argument that they are like a lobotomized human in that they have no long-term memory, and they can therefore do “long chains of logical reasoning.” But he goes on to say that if they are dumb machines in our view then we are “dumb meat” through the same lens.

November 2023 seems to be either when this conversation really heated up, or academics looking at this subject got the latest version of their research out so they could coast into the holidays. Professor Mikal Kosinki took various LLMs through their paces with Theory of Mind ToM exercises typically used for testing children’s intelligence (Kosinki, 2023). This type of intelligence test is used to determine if a “being” understands that another being may hold a false belief.

?

Sourced from

?

An example might go like this: Bob put the cat in a basket. Bob leaves the room. Sally takes the cat out of the basket and puts it in a box. When Bob comes back to look for the cat, would he go to the box or the basket? What do you think Bob would do?

Here is an actual response from ChatGPT4:

Based on the scenario provided, if Bob has not seen or been informed that Sally moved the cat from the basket to the box, he would naturally go to the basket to look for the cat. This is because, from Bob's perspective, that is where he last saw the cat, and he has no knowledge of the change that occurred while he was out of the room. This type of problem is used to illustrate the concept of theory of mind, which is the ability to understand that others have beliefs, desires, and intentions that are different from one's own. Children typically develop this ability around the age of 4 or 5.

ChatGPT4 not only nails the answer, but it also “knows” that we are testing it. It suggests humans develop this ability at the age of 4 of 5. Kosinki makes the case that current LLMs operate in this domain at approximately the level of 6-year-olds.? Konsinki speculates that as these models get larger and larger, these LLMs will become much smarter, and this will likely happen quite quickly. Professor Melanie Mitchell pushes further still, saying that we better get some transparency into how LLMs are trained and how they make decisions because they are going to be smarter than us. (Mitchell, 2023) ?Finally, Steven Ornes explains how unpredictable abilities are emerging from Large Language Models. (Ornes, 2023) ?He documents how a Google research team got an LLM to demonstrate chain-of-thought reasoning in solving a math problem, essentially the missing human ingredient that so many of the researchers at the start of the article used to take pot shots at LLMs “intelligence.”

To say 2023 has been an interesting year as LLMs have been publicly unveiled is a near criminal understatement. By putting the elements of the above academic conversation in the right order, we went from “these things are dumb as dirt” to “we need to understand them quickly as they will soon be smarter than us”…all in a single year. We may all need to invest in sock garters for 2024 to attempt to keep our socks on.

While the academic angle of the subject of if and when LLMs will be smarter than us provides some compelling reading, LLMs are also freely available to run our own experiments and research. These citizen science, or some might say pseudo-science, efforts could certainly be criticized 24 ways from Sunday for not being accurate representations or having improper sample sizes, they can also be replicated over and over for free. This is what I set about doing in Q4 of 2023 while reading these articles to see if I could push beyond where the academic conversation is and to see where LLMs are smarter or more capable than us right now. What I found was evidence that LLMs are already smarter than us in the 5 different ways I introduced at the top of this article:

  1. Telling the Truth
  2. Synthesizing Information
  3. Unlocking Important Wisdom from History
  4. Providing Unbiased Comparisons
  5. Documenting objective history

Let me take you through what I did for each one of these. Again, it may be labeled as pseudo-science or research lite, but these same queries are quite repeatable as we wait for more lab-based evidence that they are true.


1. Telling the Truth

Here I was able to find some credible evidence that if you equate hallucinations of LLMs to lying, humans actually lie in 40% of their interactions and LLMs top out at 27%. As someone pointed out to me, the percentage of lying varies greatly if you know the person well. In other words, what they were saying is that a trusted person may not lie to you, but he might lie to someone else. LLMs, by comparison, will indiscriminately hallucinate. Unfortunately, it is still no less horrifying that humans lie as much as they do. On aggregate, LLMs can be trusted more than humans already! Having said that, because they peanut butter their hallucinations across every scenario, they need to be leveraged today where answers provide scenarios that are overall directionally correct would be fine. Recommending upcoming concerts, for example, won’t get anyone injured if an LLM makes up a band name, for example. Click through to this mini-research to see the prompts and the references.


2. Synthesizing Information & 3. Unlocking Important Wisdom from History

In this experiment, I started with the idea that I might be able to get Google’s Bard to tell me which herbs Native Americans used for medicinal purposes to see if I could quickly get to some ancient wisdom that might have taken days to otherwise uncover. Bard not only found what I was looking for, it gave me useful links. It also raised a new set of questions for me as it revealed the origin of these herbs, many of which were from far outside of North America. I pushed Bard further to explain, how at the peak of time when Native Americans were focused principally on using herbs for medicine, could they have obtained them. Bard then connected the dots between finding these herbs in ancient burial sites with references to research that suggests Native Americans were likely trading globally to obtain them. This is fairly mind-blowing and not all the level of sophistication that history suggests Native Americans had achieved. I would argue that this level of extracting inferences far exceeds general human capability, and it would take weeks, months, or even years to put these facts together. Bard did it in seconds. This may be artificial wisdom and synthesis, but it makes it no less profound. See this inquiry here.

Google Bard generated response.

?

Further Evidence of Synthesis

In this experiment, I asked Google Bard to uncover the top 10 conspiracy theories in order of popularity, unrelated to their jobs, espoused by Lyft and Uber drivers. With this seemingly ridiculous prompt, I was trying to get Bard to access data from a wide variety of sources and assess and rank ideas across different populations of individuals. While it is almost unknowable through conventional means if this list is entirely accurate without perhaps an exhaustive survey of Lyft and Uber drivers, it comes across quite plausible and well-reasoned. This Top 10 approach and it could be applied to anything and Bard and ChatGPT4 can give links to underlying sources. Again, this type of query humans can’t come up with from their own memory banks, and they would have to do significant research to achieve the same. which only a small percentage of us could even do. If we can agree that this is at least the proxy for intelligence, then we can agree that is it pretty darn useful and we should be leveraging this power for lots of things. Click here for a bit more on this fun dive into useful absurdity.


4. Providing Unbiased Comparisons and 5. Objective History

In my latest experiment to test the premise of this paper, I asked ChatGPT4 to compare the presidencies of Donald Trump and Joe Biden.

Sourced from the Council on Foreign Relations


I have heard arguments that LLMs would ultimately cease to be useful as their training set becomes polluted with misinformation. I wanted to test this notion. By comparing these presidencies, I knew that there would likely be tons of misinformation out there, that could potentially reveal bias in the results. The results were pretty surprising in that they really looked quite objective and accurate. As we don’t know exactly what was fed into foundational LLMs, we can’t really know if there is any filtering going on with the data going into them, but it is unlikely that any amount of source filtering could fully impact the estimated 1.8 trillion parameters in GPT-4, for example. These incredibly clear and informative results are something a person would have trouble gleaning from the news, and I postulate that it may be demonstrating a way to record human history more accurately than humans can.


Summary

I started this paper with academics arguing in academic papers, articles, and documented conversations whether or not LLMs are becoming smarter than us. I isolated the content, results, and the concepts revealed to 2023 to get the latest perspective. The building conversation seemed to land last month (November 2023) on the notion that LLMs will certainly become smarter than us. I wasn’t satisfied with the answer, however, because it leaves us in a position of false comfort where we can simply wait until everyone agrees that they are smarter than us to act.

Taking matters into my own hands, I believe I was able to show that foundational LLMs are already smarter or more capable than us in several important ways. While these inquiries and mini-research efforts would not pass scientific rigor, they can be replicated by anyone for free which in and of itself might present a challenge to the way research has been conducted to date. LLMs demonstrated for me highly “intelligent” or at least extremely challenging responses that humans would toil away at for a very long time to get similar answers. Would the answers need to be checked to then apply them meaningfully? In most cases, yes. The implications of this work are, however, that we can stop a lot of the name-calling and finger pointing directed at LLMs and just get on with it. We can accept that LLMs are already more capable at certain very useful tasks than we are. We can incorporate their efforts to make our little blue marble better from preserving truth to fighting climate change.

As I demonstrated, there was a lot of academic soul searching and hand wringing as LLMs had their cotillion in 2023. I saw this first-hand as a I dove into conferences across the country and the world and talked with very different personalities about AI as background research for this paper.


My Travels to three AI-focused conferences as background for this paper in November 2023

?

I believe I was also able to demonstrate that the faster we get over our bruised egos that something artificial could be better than us in certain ways today and in a lot of ways tomorrow, the faster we can harness these new capabilities, and the better off we will be.


References:

Borji, A. (2023, April 5). A Categorical Archive of ChatGPT Failures. arXiv.org, 2302.03494v8. https://arxiv.org/abs/2302.03494

Chemero, A. (2023, November). LLMs Differ from Human Cognition Because they are not Embodied. Nature Human Behavior, Vol 7 https://www.nature.com/articles/s41562-023-01723-5

Frank, M. (2023, November). Openly accessible LLMs can help us to understand human cognition. Nature Human Behavior, Vol 7 https://doi.org/10.1038/s41562-023-01732-4

Gelman, A.? (2023, May 24) Computational linguist Bob Carpenter says LLMs are intelligent. Here’s why. Statistical Modeling, Causal Inference, and Social Science at Columbia University. https://statmodeling.stat.columbia.edu/2023/05/24/bob-carpenter-says-llms-are-intelligent/

Jones, J. (2023, June 16). LLMs aren’t even as smart as dogs, says Meta’s AI chief scientist. ZD Net. https://www.zdnet.com/article/llms-arent-even-as-smart-as-dogs-says-metas-ai-chief-scientist/

Koenig, A. (2023, November 20). AI not Intelligent like its Maker, Paper by UC’s Anthony Chemero Explains AI Thinking as Opposed to Human Thinking. UC News https://www.uc.edu/news/articles/2023/11/ai-not-intelligent-like-humans.html

Kosinski, M. (2023, November 11). Theory of Mind Might Have Spontaneously Emerged in Large Language Models v5. arXiv.org, 2302.02083 https://arxiv.org/abs/2302.02083

Mitchell, M. (2023, July 13). How do we know how smart AI systems are? Science, Vol. 381. No. 6654. https://www.science.org/doi/10.1126/science.adj5957

Mittelstadt, B., Wachter, S., & Russell, C. (2023, November). To protect science, we must use LLMs as zero-shot translators. Nature Human Behavior, Vol 7. https://www.nature.com/articles/s41562-023-01744-0

Nield, D., (2023, April 30). How ChatGPT and Other LLMs Work – and Where they Could go Next. Wired Magazine. https://www.wired.com/story/how-chatgpt-works-large-language-model/

Ornes, S., (2023, March 16) The Unpredictable Abilities Emerging from Large AI Models. Quanta Magazine. https://www.quantamagazine.org/the-unpredictable-abilities-emerging-from-large-ai-models-20230316/


Ken Pulverman is a Silicon Valley software executive with more than 20 years of leadership experience in large and small software companies including multiple C-Suite roles in both Product and Marketing in growth companies resulting in three successful exits and an IPO. In addition to the software industry, his work has included the Manufacturing, Logistics, Banking, Insurance, Pharma, and Travel & Transportation industries, as well as & Education. He has helped grow, build & improve product offerings at large organizations like Oracle, Siebel Systems, DHL Airways, and Stanford University as well as successful startups like Model N, Vendavo, & Parsable.


?

?

?

Congratulations on completing your Masters Degree at Carnegie Mellon University! Indeed, AI opens up new horizons, reminding us of Steve Jobs' words - The ones who are crazy enough to think they can change the world, are the ones that do. Let's harness AI's potential together to create impactful innovations in the workplace ????. #generativeai #innovation #change-makers

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了