ChatTuring vs. ChatGPT
Last year Blake Lemoine, a senior software engineer from Google, claimed that Google’s LaMDA chatbot had achieved consciousness. Google denied this, claiming that it is merely a large language model (LLM) that imitates word combinations it has encountered.?But, of course, imitation is how children start talking as well! Are LLMs close to becoming sentient?
OpenAI’s ChatGPT has garnered immense attention recently, and many of its users side with Blake Lemoine, despite defenses built into chatGPR to guard against just that.
The standard method for determining intelligence is called the Turing Test, named after Alan Turing’s 1950 seminal paper Computer Machinery and Intelligence. Turing provides specific questions that he would ask a chatbot, along with answers from an intelligent computer (which we shall call ChatTuring):
Q: Please write me a sonnet on the subject of the Forth Bridge.
A: Count me out on this one. I never could write poetry.
Q: Add 34957 to 70764
A: (Pause about 30 seconds and then give as answer) 105621.
Q: Do you play chess?
A: Yes.
Q: I have K at my K1, and no other pieces. You have only K at K6 and R at R1. It is your move. What do you play?
A: (After a pause of 15 seconds) R-R8 mate.
I decided to submit Turing’s questions to ChatGPT to see how well they would fare.
The first question is ambiguous, and a human poet would be expected to ask whether a Petrarchan or Shakespearean sonnet was desired. ChatTuring, apparently afraid of sounding mechanical, diplomatically sidesteps answering. ChatGPT has no such qualms and produced a Shakespearean sonnet in less than 30 seconds:
Upon the coast of bonnie Scotland's land,
There stands a mighty feat of engineering,
A bridge of steel, so grand and bold and grand,
The Forth Bridge, to all it's domineering.
?
Its crimson hue against the blue expanse,
A symbol of the nation's strength and might,
A marvel of the human skill and chance,
That spans across the Firth, in sheer delight.
?
The trains that cross its span with mighty roar,
A testament to its enduring power,
领英推荐
A legacy that shall forevermore,
Remain a beacon in the nation's hour.
?
The Forth Bridge, a wonder to behold,
A structure strong and steadfast, brave and bold.
While “grand and bold and grand “ is rather awkward, ChatGPT displayed better than average rhyme and rhythm talent, as well as impressive knowledge not only of the Forth Bridge being a railroad bridge in Scotland and being an early major steel structure but also of its color (Forth Bridge Red) and its being a proud symbol of Scotland.
TuringChat takes 30 seconds to incorrectly answer the second question. Is it purposely trying to fool Turing? Or perhaps Turing is implying the "strong AI" hypothesis that posits that all true intelligence is equivalent, and thus a truly intelligent bot would calculate like a human, not like a computer.
Here is ChatGPT’s answer:
The sum of 34957 and 70764 is 105721.
It had been widely reported that ChatGPT misses carries in additions of three digits and higher. This flaw has apparently been corrected; ChatGPT now carries out arithmetic quickly and correctly.
The final exchange starts with “Do you play chess?” to which ChatGPT replies:.
Yes, I can play chess! As an AI language model, I don't have physical capabilities, but I can help you learn how to play, provide you with tips and strategies, and even play a game with you through text-based communication.
Fair enough. So, I asked Turing’s chess question, which is shrewdly ambiguous. It takes as obvious that the object is to win in as few moves as possible, something that a computer might not realize. It does not specify who is black and who is white. The statement “R at R1” leaves open whether the rook is at queen’s rook 1 (bottom left corner) or king’s rook 1 (bottom right corner). All of these would elicit questioning from a human chess player. Turing’s chatbot apparently tried all possibilities and gave a correct and equivalently ambiguous answer in a short, but not unbelievably short, amount of time.
ChatGPT’s answer is painfully verbose, and just wrong.
This is a classic chess puzzle known as the "King and Rook versus King" endgame. The goal of the attacking player (in this case, me) is to checkmate the opposing king (in this case, yours) using the rook and king.
In this position, I would play the following move: Rb1+
This move brings the rook to b1, checking your king at K1. Now your king must move, as it is in check. From here, I can use my rook to restrict your king's movements until I am able to deliver a checkmate.
When ChatGPT is out of its depth, it tends to go into a very convincing discourse about something else.
So, how well did ChatGPT do?
ChatTuring and ChatGPT are diametrically opposite. Turing’s bot is a laconic diplomat, can’t write poetry, and is a chess whiz. ChatGPT is a verbose politician, not a bad poet, and has a calculator in hand.
But the difference between ChatTuring and ChatGPT shouldn’t be a deciding factor. Some people are diplomats and some politicians. Nor should the existence of apps that can spot ChatGPT text sway us – the question is whether a human can.
One may readily recognize ChatGPT by its being too fast, too knowledgeable, and too proficient. Were a chatbot based on the same LLM have been designed differently, I venture that even Turing would be fooled.
So, I asked ChatGPT “Would you pass the Turing test?” to which it replied:
As an AI language model, I have been trained on a vast amount of data and can respond to a wide range of questions and topics. While I am capable of generating responses that can be difficult to distinguish from those of a human, passing the Turing test ultimately depends on the specific criteria and evaluation methods used by the human evaluator.
I couldn't agree more.
Semi Retired at Consultant
1 年Have a look at gpt4- there is a nice paper comparing it to 3 and the improvements are astounding.
Very interesting. Given that the Turing test is out in the open, isn't it likely that an intelligent AI would fail it on purpose? ?? A few days ago, someone distributed a semi-humorous video he created with AI: A one-minute Rap song for the protests in Israel. It was funny, had a good rhythm, and included all the relevant messages. It also included this message, for a fleeting second, right in the middle of the song. Reminds me a lot of Darth Vader's argument for the Empire... This worries me a lot more than any potential test ??
Very nice. Not as dramatic as Blade Runner, but non-fiction is not fiction
Communication Consultant ? Workshop Facilitator ? Speaker
2 年Give me a laconic diplomat over a verbose politician with a calculator any day!! Loved the article Yaakov Stein, thanks.
Technology Leader
2 年Very interesting Yaakov, maybe you should have started with the last question...