登录查看更多内容

ChatGPT: how easily one can get confused.

Ivan Reznikov

PhD, Principal Data Scientist || O'Reilly Book Author || TEDx/PyCon/GITEX Speaker || University Lecturer || LangChain, Large Language Models (LLMs) and Generative AI || 30K+ followers

发布日期: 2022年12月11日

OpenAI's new AI, ChatGPT, has garnered a lot of attention and interest. ChatGPT is not only a large language model able to generate text. It is also a chat that takes previous intents and inputs and uses them for the following responses.

In our experiment, we will simulate a few chess training sessions with ChatGPT to evaluate its performance. ChatGPT's abilities in playing chess aren't expected to be high, it's a language model, not a chess engine, but we'll look at how well it picks up sequences.

No prior chess knowledge is necessary, just a basic understanding of the rules.

Spoiler alert: even for a language model, ChatGPT's chess skills are impressively advanced.

Experiment. Scholar’s mate

Our first session will be dedicated to Sholar’s mate. This is one of the easiest mates in chess that is usually taught when you start learning chess.

Ok. So far, so good. Although it’s weird ChatGPT says 3… d6 is the only move. It’s easily noticed that 3… Qe7, 3… Qf6, 3… d5, 3… Nh6, etc, also work. Let's move on to discussing other ways to defend against the checkmate we face.

So there are a couple of points I want to stop on. Of course, it’s incorrect that 3… Qe7 doesn’t prevent checkmate.

But notice how it switched from (3…) black move to (4.) white move. Stating 3… g6 is the strongest move is correct and probably taken somewhere from the chess literature.

Quite impressive that the language model (not chess engine) can figure out that 4… Qxf7 leads to the white queen being captured. And quite impressive ChatGPT is following with a legal bishop capture of the queen (still not a direct or leading to checkmate, though).

One can notice that the "chat" function is working properly and that my explanation was accurate. The minor error is referring to the 5th move instead of the 4th.

领英推荐

What is the Future of ChatGPT?

Analytics Insight? 8 个月前

Open-Source Implementations of ChatGPT’s Training…

Lightning AI 2 年前

Learning about ChatGPT

Miguel Reynolds Brand?o 2 年前

The rest of the text appears to be unique according to multiple plagiarism checkers but does make some sense. Let's move on to analyzing the position that we have obtained

Ok, ChatGPT seems to drop the chess piece while discussing positions obtained from one chat intent to the next. The majority of the generalistic text is still quite correct. Out of 3 suggested moves, 2 are legal. Can we figure out why the model believes Bg5 is a legal move?

Wow. It's quite interesting how correct the model thinks it is. Is it due to the fact chess authors consider themselves always right backed up by powerful chess engines?

The deeper into the game, the less sense the answers make. Though it correctly states it is the 6th black move to be made (6… h6), it’s wrong to talk about pins within the position context. In fact, the phrase “to move without being captured” is quite “un-chessy”. The FIDE referencing is also a bit misleading, as I was asking regarding the validity of the Bg5 move if the pawn (e2) is in its initial position, not the position being initial.

The final question to ask regarding the position is related to its evaluation. No surprise ChatGPT “evaluation” is entirely wrong, as it’s just a language model. Despite some mistakes, it’s pretty impressive how can a language model perform at a chess opening.

Conclusion

Although ChatGPT is a powerful language model, it cannot play chess. This is because ChatGPT is a text-based model and does not have the ability to understand or interpret visual information like a board game. Additionally, playing chess requires a high level of strategic thinking and decision-making, which goes beyond the scope of ChatGPT’s capabilities.

But the goal of the article wasn't just studying the ability to play a popular board game. From an ML perspective, the model performance is extremely solid.

But this is the catch.?

The concept of the uncanny valley can also apply to machine learning models. When using language models for knowledge generation, it's essential to be aware that the model may make errors, even if most of the generated content appears correct. This can create a situation where the errors are difficult to detect and can lead to inaccurate or unreliable information. It's important to carefully review and validate the output from language models to ensure its accuracy and avoid falling into the uncanny valley, as the temptation to use ChatGPT or alternatives for other purposes may be too high.

And one easily can get confused.

#chatgpt?#chatgpt3?#openai?#machinelearning?#datascience?#ml?#ds?#nlp?#language

Newsletter for ML enthusiasts

11,172 位关注者

Simon Hepworth

2 年

Ivan Reznikov “Fake it til you make it” springs to mind with #chatgpt here: Taking a human analogy, a confident networker can appear convincing on a subject they “know” nothing about by repeating what they have heard others say. In doing so they open conversations that surface more things they can repeat, increasing the gap between what we believe they know, and what they actually understand. At some point the confident networker has acquired enough fun facts on the subject that they might stitch a few together and develop a basic understanding. Is #chatgpt a confident networker?

2 次回应

Ivan Reznikov

PhD, Principal Data Scientist || O'Reilly Book Author || TEDx/PyCon/GITEX Speaker || University Lecturer || LangChain, Large Language Models (LLMs) and Generative AI || 30K+ followers

2 年

Full research: https://medium.com/@ivanreznikov/how-good-is-chatgpt-at-playing-chess-spoiler-youll-be-impressed-35b2d3ac024a

查看更多评论

要查看或添加评论，请登录

Ivan Reznikov的更多文章

5 Reasons Why Sam Altman Might've Been Fired from?OpenAI?

2023年11月18日

5 Reasons Why Sam Altman Might've Been Fired from?OpenAI?

You won’t find jokes like “Sam Altman is officially the first person to lose a job because of ChatGPT” or “Microsoft…

4 条评论
How to Fit Large Language Models in Small Memory: Quantization

2023年9月4日

How to Fit Large Language Models in Small Memory: Quantization

Large Language Models can be used for text generation, translation, question-answering tasks, etc. However, LLMs are…

11 条评论
I Caught 16 US Presidents Using ChatGPT

2023年8月2日

I Caught 16 US Presidents Using ChatGPT

This story is about AI-generated text detectors and their scoring capabilities. While preparing slides, writing an…

1 条评论
How exactly LLM generates text?

2023年7月27日

How exactly LLM generates text?

This article won't discuss transformers or how large language models are trained. Instead, we will concentrate on using…

19 条评论
Reasons Why You Will Need Linear Algebra as a Data Scientist

2023年3月7日

Reasons Why You Will Need Linear Algebra as a Data Scientist

This article is not about why linear algebra is essential in machine learning. This is an article on why you will need…

6 条评论
Hybrid Rule-ML Solutions: A Smarter Way to Run Business

2023年2月27日

Hybrid Rule-ML Solutions: A Smarter Way to Run Business

I have a confession to make. When I was younger, I was sure that ML could, if not overperform, at least match the…

6 条评论
ML Systems for Business: A Step-by-Step Guide

2023年2月7日

ML Systems for Business: A Step-by-Step Guide

Machine learning has rapidly transformed the business world in the recent years, offering new opportunities for…

6 条评论
Data Scientist 2.0: The Evolution of the Role and the Skills Needed to Succeed

2023年1月28日

Data Scientist 2.0: The Evolution of the Role and the Skills Needed to Succeed

Data science has rapidly evolved over the past decade, with the demand for data scientists skyrocketing and the job…
The Misuse of Terminology in Data Field Job Descriptions

2023年1月23日

The Misuse of Terminology in Data Field Job Descriptions

What is the difference between "Machine Learning" and "Artificial Intelligence"? What about the difference between…

31 条评论
Stop Starting, Start Finishing: How To Achieve Your Pet Project Goals

2023年1月15日

Stop Starting, Start Finishing: How To Achieve Your Pet Project Goals

I've recently overheard that one's developers' New Year's resolution was to finally finish those pet projects that have…

7 条评论

See all articles

ChatGPT: how easily one can get confused.

Ivan Reznikov

PhD, Principal Data Scientist || O'Reilly Book Author || TEDx/PyCon/GITEX Speaker || University Lecturer || LangChain, Large Language Models (LLMs) and Generative AI || 30K+ followers

Experiment. Scholar’s mate

领英推荐

Conclusion

Newsletter for ML enthusiasts

11,172 位关注者

Ivan Reznikov的更多文章

社区洞察

其他会员也浏览了

From Work to Play: ChatGPT, Your Ultimate Artificial Intelligence Companion

How ChatGPT helps to enhance the productivity of businesses and automation

The menacing realism of AI

DeepSeek and ChatGPT: A Comparative Analysis with a Deep Dive into Group Relative Policy Optimization (GRPO)

What is ChatGPT? Technology behind ChatGPT

My Early Take on ChatGPT: What’s Amazing and What’s Alarming

Beyond the Hype: Decoding ChatGPT 4.0

Is the ChatGPT-4 the End-All-Be-All Information Resource?

Training & Running ChatGPT locally - II

Simplifying ChatGPT: Mastering Prompt Techniques for Everyone

Experiment. Scholar’s mate

领英推荐

Conclusion

Newsletter for ML enthusiasts

11,172 位关注者

Ivan Reznikov的更多文章

5 Reasons Why Sam Altman Might've Been Fired from?OpenAI?

How to Fit Large Language Models in Small Memory: Quantization

I Caught 16 US Presidents Using ChatGPT

How exactly LLM generates text?

Reasons Why You Will Need Linear Algebra as a Data Scientist

Hybrid Rule-ML Solutions: A Smarter Way to Run Business

ML Systems for Business: A Step-by-Step Guide

Data Scientist 2.0: The Evolution of the Role and the Skills Needed to Succeed

The Misuse of Terminology in Data Field Job Descriptions

Stop Starting, Start Finishing: How To Achieve Your Pet Project Goals

社区洞察

其他会员也浏览了

From Work to Play: ChatGPT, Your Ultimate Artificial Intelligence Companion

How ChatGPT helps to enhance the productivity of businesses and automation

The menacing realism of AI

DeepSeek and ChatGPT: A Comparative Analysis with a Deep Dive into Group Relative Policy Optimization (GRPO)

What is ChatGPT? Technology behind ChatGPT

My Early Take on ChatGPT: What’s Amazing and What’s Alarming

Beyond the Hype: Decoding ChatGPT 4.0

Is the ChatGPT-4 the End-All-Be-All Information Resource?

Training & Running ChatGPT locally - II

Simplifying ChatGPT: Mastering Prompt Techniques for Everyone