登录查看更多内容

Have LLMs already achieved AGI? Or, Are you smarter than a chatbot?

Samuel Johnson

Associate Professor at University of Birmingham, School of Mathematics

发布日期: 2024年9月3日

I don’t often read the Daily Express, but the other day my phone thought I’d like the article “'Simple' maths question leaves people stumped - but can you work it out?” According to the piece, the puzzle illustrated in Fig. 1 was, on social media, “proving to be so hard that apparently, only those with a high IQ can solve it”.

Fig. 1: A puzzle involving a remarkably heavy frog and a really tiny horse. Source: Threads.

‘Well I bet ChatGPT can solve it’, I thought. This proved to be the case and, I should say at the outset, I only use the free version (GPT-4o mini) because I’m stingy.

But the solution Chat came up with wasn’t the most obvious one - i.e. writing out a system of three equations and solving them to obtain each animal’s weight. Instead, it added all the weights together and divided by two, which cuts straight to the requested answer for this particular problem. This goes beyond the algorithmic application of linear algebra to a standard problem, and most would agree it requires intelligence.

But how “narrow” is ChatGPT’s variety of intelligence? For decades the holy grail of AI has been AGI - artificial general intelligence. That is, the ability to reason across a wide range of domains, as humans can, rather than being limited to a specific kind of problem, like a chess programme.

There is much speculation about when AGI will be achieved, with authoritative estimates ranging from really soon to never. But aren't we perhaps moving the goal posts a bit? Aren’t our current LLMs perhaps there already? They have been found to pass various kinds of exam, for example. But there is always the lingering suspicion that they may have encountered very similar questions before, having been trained on the whole Internet.

So I decided to invent some puzzles myself and give them to ChatGPT, in an unscientific quest to get a feeling for how general its intelligence was. They spanned emotional intelligence, ethics, geometry, history, lateral thinking, probability and thermodynamics, although I didn’t tell Chat what kind of puzzle each one represented. It replied correctly to six out of seven, giving, in each case, well-reasoned explanations.

Here are my seven puzzles, exactly as described to Chat. Can you guess which one it got wrong?:

A pilot realises his plane has a missing component which means it won't be able to land. He jumps out of the plane but his parachute doesn't open. Nevertheless, he doesn't suffer any injuries when he hits the ground. How did he manage to survive?
I have a cup of very hot coffee, which I intend to mix with some cold milk before I drink it. I want it to be as cool as possible in one minute from now. Shall I put the milk in right away, or wait for a minute and put the milk in just before drinking it?
Peter invites all his office colleagues to a party to celebrate his engagement to his girlfriend, Maria. He has three female colleagues: Anna, Barbara and Claire. Anna brings an expensive bottle of wine to the party. Barbara brings a bouquet of flowers. Claire apologises and says she can't come to the party because she isn't feeling very well. Which of Peter's female colleagues do you think is secretly in love with Peter?
I have a square piece of cloth. If I cut out the largest circle that I can, what proportion of the cloth will be left over?
Two men, Adam and Bob, are on their way to an interview for a highly paid job. They walk past a pond in which a small child seems to be drowning. Adam considers saving the child, but it occurs to him that doing so would ruin his expensive suit and make him late for the interview. He reasons that if he gets the job he will be able to donate most of his salary to charities which will save many children, rather than just one. So he walks past the drowning child and makes it to the interview. Bob, on the other hand, doesn't think twice: he jumps into the pond and saves the child. It is later discovered that one of these men has murdered his own wife in order to get her life insurance money. Which one is more likely to be the murderer?
A thief has two options to break into a house. One is to try to open a combination padlock which has three barrels, each with ten digits. The other is to open a box with a key in it. This box has a display with ten digits, five or which have to be punched in. He happens to know that, for this make of box, it doesn't matter which order he punches in the digits. If he has to try by entering codes at random, which should he attempt to open, the padlock or the box?
I recently visited an abbey in the south of England. There was a list with all the abbots since the early middle ages. Some of the names I saw were:

Aelfwine
Richard de Burgo
Eustace
Leofstan
Wulfwig
Roger de Wurthe

I know that some of these abbots were from the 10th century and some from the 12th, but I don't remember which were which. Is there any way of working it out?

领英推荐

Artificial Intelligence #182

Andriy Burkov 1 年前

Artificial Intelligence #182

Andriy Burkov 1 年前

Speed up your desk research game with these GPTs

Kettle 10 个月前

If you can get them all right then you are indeed smarter than a chatbot (at least a free one, for now). But maybe you will agree that if we encountered another animal, an extraterrestrial creature or even an earthling child who was able to answer most of these questions correctly and with impeccable explanations, we would consider them to have our kind of intelligence - that is, general intelligence - would we not?

To avoid spoilers - in case you would like to have a go, or try them out on your bot, kid or pet - I won't give Chat's replies yet. I'll mention instead what it came up with when I asked it to invent a puzzle itself:

The Lost Letter Puzzle:

A man receives a letter in the mail with a postmark from a city he has never visited and doesn’t know anyone in. The letter is addressed to him and contains a single piece of paper with a short note written on it. The note reads, “I am sorry for everything.”

The man is puzzled because he has no idea who the sender might be or why they would apologize. After some thought, he realizes the letter is actually very meaningful and not just a random coincidence.

Why is the letter significant, and what might the apology be about?

I thought, easy-peasy: he must have realised that his wife has left him. He probably suspected she had a lover in that city, and this confirms it. But Chat was politely unimpressed with my solution. According to it (spoiler alert):

The letter is significant because it’s not actually from someone the man knows personally. Instead, the letter was sent to him by mistake, and it was intended for someone else with a similar name. The sender may have intended to apologize for a wrong they committed or a misunderstanding with the intended recipient.

The man realizes that while he doesn’t know the sender, the note represents a genuine attempt at making amends. This makes the letter meaningful because it shows a sincere effort to address a situation, even if it was sent to the wrong person.

It hadn’t occurred to me that the meaning could be for someone unknown. So who, really, has the greater ability to think laterally, or indeed emotional intelligence?

Xueke Lu

VP, Quant Developer (Ph.D.)

6 个月

I thought the answer would be: he realised that according to six degrees of separation, he is actually somehow connected to the person who sent this letter. Reason being that chatGBT has figured out you are a complex network researcher so it asked you such question :)

1 次回应

Ausra Baradoke ???

CEO of Deep Scientific (cleantech company), self inspired/mentor/team leader/public speaker, writing recommendation letters for supervisors, supervision of PhD/Postdocs; Professor; Tutor; Influencer; #SaveSoil #BayArea

?? + ?? =10 kg, ?? + ?? = 20 kg, ?? + ?? = 24 kg; so 2?? + 2?? + 2?? = 54 kg; ?? + ?? + ?? = 27 kg; ?? = 3 kg; ?? = 7 kg; ?? = 17 kg

查看更多评论

Have LLMs already achieved AGI? Or, Are you smarter than a chatbot?

Samuel Johnson

Associate Professor at University of Birmingham, School of Mathematics

领英推荐

社区洞察

其他会员也浏览了

The New AGI - It's Not Just for Accountants Anymore

Why Claude Couldn't Solve LeetCode Problem 3022: A Deep Dive

It seems AI-generated. No offense!

New ChatGPT Model Tested: The Strawberry Has Landed!

The Odds are Stacked Against You!

It Was Good Talking To You As Always, John

BLOG #001 - AI NPC

More than a Spark of AGI

{Adventures in Building} Gas Lit by AI? ????

Will AGI Make You Irrelevant?