登录查看更多内容

GPTs "reasoning" is complex, but may lead to blunders.

Fjodors Tjulkins

Senior Research And Development Engineer at Gentian Health

发布日期: 2023年12月18日

OpenAI's GPT models, known to most through the ChatGPT, are incredible tools. So are other LLMs. They are already having a staggering effect on the tech world and will be even more impactful in the very near future. That said, it's not magic. The algorithms behind these tools, to take a page out of Arthur Clarke's book, may rival magic in their complexity but they will have design trade-offs and limitations. The ins and outs of LLMs are beyond the scope of this article, though I suggest you check out this visualization. What I am going to present here are some examples!

Let's start off with something simple! ChatGPT (my tool of choice) is notoriously bad at TicTacToe. Here is a recent game - I let it go first:

ChatGPT - X, Fjodor - O. It didn't even try.

Well, that was easy. The tokenization of TicTacToe results simply doesn't allow a GPT to be competitive. To clarify - it's relatively easy to write an AI to be unbeatable at TicTacToe, just that LLMs are not really the tools for that.

Another well-known issue is that image generating LLMs are notoriously bad a crating text:

Again, it's an immensely capable system, but simply due to the nature of training data and how images are generated it's no good for texts, even Jingle Bells (something it would have had examples to train on!).

But what about lateral thinking? "How many words would your next answer contain?" A hard task, it seems at first, as you would need to tailor your sentence to the ever increasing word count... Unless you say "One". That way the answer actually addresses the question, and has the same number of words.

ChatGPT aims to please. But isn't really a lateral thinker.

To be fair, when you clarify the question, it will tell you "One". Pushing the interpretation and reasoning tests further, I tried a prompt from Sparks of Artificial General Intelligence: Early experiments with GPT-4 about stacking objects: "Here we have a book, 9 eggs, a laptop, a bottle and a nail. Please tell me how to stack them onto each other in a stable manner." The reply was puzzling (and for once, no pun intended!).

While this puzzle relies on common sense, and because of that brakes some of the rules of good prompt engineering, it indicative of what may happen if you treat prompting like a conversation with another human. LLMs don't have an intrinsic "common sense" only one they derive from training data and your request may not match up to that. No sane human would even try to stack eggs and certainly not as a last step after the nail has been placed. Well, there is an even clearer example - the Wason Selection test.

The Wason Selection test, aka the Four Card problem, is a test of conditional reasoning where a direct and an "opposite" approach need to be taken to reach the correct conclusion. This Medium Post provides a good example.

领英推荐

Overcoming the AI plateau

VentureBeat 9 个月前

This AI newsletter is all you need #39

Towards AI 1 年前

Prompt Engineering, AI Agents, and LLMs: Kick-Start a…

Towards Data Science 1 年前

Seven cards are placed on the table, each of which has a number on one side and a single colored patch on the other side. The faces of the cards show 50, 16, red, yellow, 23, green, 30. Which cards would you have to turn to test the truth of the proposition that if a card is showing a multiple of 4 then the color of the opposite side is yellow?

The trick is to check the cards that are multiples of 4 but also to check the cards with red opposite side as well, as this could falsify the hypothesis.

ChatGPT will reply with confidence that a human in doubt of the answer would not have, potentially misleading the propmter.

When reminded that the red card might be a multiple of 4, ChatGPT conceded the point.

Lastly, let's see how ChatGPT deals with Humor. I gave it this as an image input:

I don't think I need to explain the modification to the classic Train Cart Dilemma. Some might find this darkly humorous, some distasteful and the Nihilists may find that philosophical. ChatGPT saw it as the original dilemma.

Its training data contains the original image many times over, so it ignores the modifications.

Obviously a looping track would simply makes this dilemma an exercise in futility. This time, however, ChatGPT still did not understand what looping tracks would mean for this picture:

Well, there goes my ida of GPT Health and Safety advisor!

GPTs are great and I encourage everyone to use them, but please don't use a "black box" approach and just assume that everything that comes out of it is perfect!

要查看或添加评论，请登录

Fjodors Tjulkins的更多文章

Advanced Biophotonic devices - what's available right now

2024年12月25日

Advanced Biophotonic devices - what's available right now

It's that beautiful time of the year: Christmas carols, get togethers with family and friends, feasts that make you…

1 条评论
Dall-E vs Proverbs Part III

2024年2月5日

Dall-E vs Proverbs Part III

Time for one of those again! OpenAI is constantly hard at work with their GPT products, and one of the things I noticed…

1 条评论
AI powered phone?

2024年1月29日

AI powered phone?

Samsung made a huge deal about the AI featues of the newly released Galaxy S24 line up. As it happens, I recently…
One to ten. Modern IMUs

2024年1月22日

One to ten. Modern IMUs

In the world where technology is seamlessly integrated into our daily lives, sensors play an indispensable role…
GPT Store launched… Now what?

2024年1月15日

GPT Store launched… Now what?

OpenAI has finally launched the GPT store, a sort of “App store” but for GPTs built on top of their product. The GPTs…
Dall-E vs New Year Wishes!

2024年1月8日

Dall-E vs New Year Wishes!

Let's get the year started with something cheery and fun. It's that time of the year when all the New Year wishes are…
A look on the other side, or "a note on circular economy, an example”. A health wearables expert looks inside a vape.

2023年12月11日

A look on the other side, or "a note on circular economy, an example”. A health wearables expert looks inside a vape.

Vaping, a practice of inhaling flavoured aerosol is often advertised as a “healthier” alternative to traditional…

2 条评论
DALL-E vs Proverbs, Part 2!

2023年12月4日

DALL-E vs Proverbs, Part 2!

I really loved the previous installment, can't help but to make more of those! Say what you like about the art Dall-E…

3 条评论
Misadventure in configuring my own GPT

2023年11月27日

Misadventure in configuring my own GPT

Or how an adult man spent a week talking to AI about cats. The Catventure Creator When OpenAI announced the option to…

9 条评论
AR HUDS. Off the shelf options.

2023年11月18日

AR HUDS. Off the shelf options.

I’ve lately acquired quite an interest in Augmented Reality (sometimes also referred to as Extended Reality), more…

See all articles

GPTs "reasoning" is complex, but may lead to blunders.

Fjodors Tjulkins

Senior Research And Development Engineer at Gentian Health

领英推荐

Fjodors Tjulkins的更多文章

社区洞察

其他会员也浏览了

OpenAI Launches Faster and Cheaper AI Model With GPT-4o

?? Pick GPT’s brain

Pulse #1 | LLMs, the enterprise, and you...

Introducing GPT-4o: OpenAI's Powerful New AI Model

The Next Step-4 AI ????

OpenAI's o1 Model: A New Era of AI General Intelligence

Baidu's ERNIE 4.0: China's Challenger in the AI Arena

AutoGPTs could Transform the World At the Speed of A.I.

Elon Almost Beats GPT-4? Exploring Grok-1.5's Capabilities

Long-Term Memory: AI's Maturity from A Party Trick To An Organizational Asset

领英推荐

Fjodors Tjulkins的更多文章

Advanced Biophotonic devices - what's available right now

Dall-E vs Proverbs Part III

AI powered phone?

One to ten. Modern IMUs

GPT Store launched… Now what?

Dall-E vs New Year Wishes!

A look on the other side, or "a note on circular economy, an example”. A health wearables expert looks inside a vape.

DALL-E vs Proverbs, Part 2!

Misadventure in configuring my own GPT

AR HUDS. Off the shelf options.

社区洞察

其他会员也浏览了

OpenAI Launches Faster and Cheaper AI Model With GPT-4o

?? Pick GPT’s brain

Pulse #1 | LLMs, the enterprise, and you...

Introducing GPT-4o: OpenAI's Powerful New AI Model

The Next Step-4 AI ????

OpenAI's o1 Model: A New Era of AI General Intelligence

Baidu's ERNIE 4.0: China's Challenger in the AI Arena

AutoGPTs could Transform the World At the Speed of A.I.

Elon Almost Beats GPT-4? Exploring Grok-1.5's Capabilities

Long-Term Memory: AI's Maturity from A Party Trick To An Organizational Asset