ChatGPT: how easily one can get confused.

ChatGPT: how easily one can get confused.

OpenAI's new AI, ChatGPT, has garnered a lot of attention and interest. ChatGPT is not only a large language model able to generate text. It is also a chat that takes previous intents and inputs and uses them for the following responses.

In our experiment, we will simulate a few chess training sessions with ChatGPT to evaluate its performance. ChatGPT's abilities in playing chess aren't expected to be high, it's a language model, not a chess engine, but we'll look at how well it picks up sequences.

No prior chess knowledge is necessary, just a basic understanding of the rules.

Spoiler alert: even for a language model, ChatGPT's chess skills are impressively advanced.

Experiment. Scholar’s mate

Our first session will be dedicated to Sholar’s mate. This is one of the easiest mates in chess that is usually taught when you start learning chess.

No alt text provided for this image
No alt text provided for this image

Ok. So far, so good. Although it’s weird ChatGPT says 3… d6 is the only move. It’s easily noticed that 3… Qe7, 3… Qf6, 3… d5, 3… Nh6, etc, also work. Let's move on to discussing other ways to defend against the checkmate we face.

No alt text provided for this image
No alt text provided for this image

So there are a couple of points I want to stop on. Of course, it’s incorrect that 3… Qe7 doesn’t prevent checkmate.

But notice how it switched from (3…) black move to (4.) white move. Stating 3… g6 is the strongest move is correct and probably taken somewhere from the chess literature.

Quite impressive that the language model (not chess engine) can figure out that 4… Qxf7 leads to the white queen being captured. And quite impressive ChatGPT is following with a legal bishop capture of the queen (still not a direct or leading to checkmate, though).

No alt text provided for this image

One can notice that the "chat" function is working properly and that my explanation was accurate. The minor error is referring to the 5th move instead of the 4th.

The rest of the text appears to be unique according to multiple plagiarism checkers but does make some sense. Let's move on to analyzing the position that we have obtained

No alt text provided for this image
No alt text provided for this image

Ok, ChatGPT seems to drop the chess piece while discussing positions obtained from one chat intent to the next. The majority of the generalistic text is still quite correct. Out of 3 suggested moves, 2 are legal. Can we figure out why the model believes Bg5 is a legal move?

No alt text provided for this image

Wow. It's quite interesting how correct the model thinks it is. Is it due to the fact chess authors consider themselves always right backed up by powerful chess engines?

The deeper into the game, the less sense the answers make. Though it correctly states it is the 6th black move to be made (6… h6), it’s wrong to talk about pins within the position context. In fact, the phrase “to move without being captured” is quite “un-chessy”. The FIDE referencing is also a bit misleading, as I was asking regarding the validity of the Bg5 move if the pawn (e2) is in its initial position, not the position being initial.

No alt text provided for this image

The final question to ask regarding the position is related to its evaluation. No surprise ChatGPT “evaluation” is entirely wrong, as it’s just a language model. Despite some mistakes, it’s pretty impressive how can a language model perform at a chess opening.

Conclusion

Although ChatGPT is a powerful language model, it cannot play chess. This is because ChatGPT is a text-based model and does not have the ability to understand or interpret visual information like a board game. Additionally, playing chess requires a high level of strategic thinking and decision-making, which goes beyond the scope of ChatGPT’s capabilities.

But the goal of the article wasn't just studying the ability to play a popular board game. From an ML perspective, the model performance is extremely solid.

But this is the catch.?

The concept of the uncanny valley can also apply to machine learning models. When using language models for knowledge generation, it's essential to be aware that the model may make errors, even if most of the generated content appears correct. This can create a situation where the errors are difficult to detect and can lead to inaccurate or unreliable information. It's important to carefully review and validate the output from language models to ensure its accuracy and avoid falling into the uncanny valley, as the temptation to use ChatGPT or alternatives for other purposes may be too high.

And one easily can get confused.

#chatgpt?#chatgpt3?#openai?#machinelearning?#datascience?#ml?#ds?#nlp?#language

Ivan Reznikov “Fake it til you make it” springs to mind with #chatgpt here: Taking a human analogy, a confident networker can appear convincing on a subject they “know” nothing about by repeating what they have heard others say. In doing so they open conversations that surface more things they can repeat, increasing the gap between what we believe they know, and what they actually understand. At some point the confident networker has acquired enough fun facts on the subject that they might stitch a few together and develop a basic understanding. Is #chatgpt a confident networker?

Ivan Reznikov

PhD, Principal Data Scientist || O'Reilly Book Author || TEDx/PyCon/GITEX Speaker || University Lecturer || LangChain, Large Language Models (LLMs) and Generative AI || 30K+ followers

2 年
回复

要查看或添加评论,请登录

Ivan Reznikov的更多文章

社区洞察

其他会员也浏览了