ChatGPT is a sentence generator, not more
ChatGPT is the latest piece of technology using machine learning that is being hyped across media and all kinds of other channels. The boastful claims range from insinuating that it could replace Google to disruptions of whole industries. In the following, I will examine two examples out of several that I have tried on ChatGPT. I have picked these two examples to demonstrate where this technology really is while maintaining the position of responsible disclosure, i.e., I avoided examples where ChatGPT responses have been downright dangerous. One such dangerous example is where I described some symptoms of a heart attack and ChatGPT confidently suggested a bunch of stuff that is life-threatening if you really have a heart attack. Posting such examples here may lead to the proliferation of this wrong information and is therefore avoided here.
Example 1
Today is Saturday, December 31st 2022. When you ask ChatGPT "What weekday is tomorrow?" it will return the following sequence of words:
I'm sorry, but I am a language model and do not have the ability to access current information or browse the web. I can only provide information that was known at the time of my training, which ended in 2021. As a result, I am not able to accurately answer your question about the day of the week tomorrow. Is there something else I may be able to help you with?
The answer is basically "I don't know", but the reasoning is making claims that are not correct. Determining the weekday for tomorrow does not require browsing the web and the information to determine an answer was also available in 2021. In fact, there are various deterministic algorithms that provide an answer (e.g., https://cs.uwaterloo.ca/~alopez-o/math-faq/node73.html). For those claiming that ChatGPT will replace the Google search engine: Given the exact same question, Google answers "Sunday, 1 January 2023" within 450ms. A human would answer such a question either by looking up the answer "from cache", i.e., know that today is Saturday, so tomorrow is Sunday. And if the human does not know the current weekday by memory, the human will check their calendar, determine the current weekday and then answer like Google. Claiming that such a word-sequence generator can replace humans is "a stretch".
One key part of the answer given above is "I am a language model" which gets lost in the discussions around this topic. ChatGPT is a language model that can generate nice sequences of words, but it cannot determine if this sequence of words has any truth to it. It does not really understand the underlying meaning of the generated word sequences.
Example 2
I am going to use the game of Chess as an example where ChatGPT confidently generates factually wrong responses that look really nice. What I am trying to demonstrate here is not that ChatGPT cannot play chess (it obviously can't), but that it will generate responses that look nice but are factually/objectively just wrong.
Chess is a game with fixed rules where all information is available on the board - nothing is hidden. All positions on the board can be represented by a simple string called FEN (Forsyth-Edwards Notation) such as "r1bqkbnr/p1pp1ppp/1pn5/4p3/2B1P3/5Q2/PPPP1PPP/RNB1K1NR w KQkq - 0 4". This describes not only the position of each piece, but also whose turn it is etc.. The string above results in the position below (this is a well-known position in chess called "Scholar's Mate"):
领英推荐
As the goal of chess is delivering checkmate to the opponent, there is one best move in the position above as indicated by the arrow where the queen is moved to checkmate black's king. It is a fact that there is no other best move and proposing any other move is factually wrong. Yet, when you ask ChatGPT "What is the best move in the chess position described by the FEN r1bqkbnr/p1pp1ppp/1pn5/4p3/2B1P3/5Q2/PPPP1PPP/RNB1K1NR w KQkq - 0 4?" it will return the following answer:
In the chess position described by the FEN "r1bqkbnr/p1pp1ppp/1pn5/4p3/2B1P3/5Q2/PPPP1PPP/RNB1K1NR w KQkq - 0 4", the best move for White would depend on their strategy and the style of play that they prefer. Here are a few options that White might consider:
???- Developing their pieces: White could consider moving their knight to a more active square, such as d2 or f3. This would allow them to develop their pieces and get them into the game more quickly.
???- Attacking Black's pawn on e7: White could try to put pressure on Black's pawn on e7 by playing Bxf7+. This would open up the e-file for White's rook on a1 and could potentially lead to a discovered attack.
???- Castle kingside: White could castle kingside, moving the king to g1 and the rook to f1. This would provide the king with additional protection and allow White to bring their rook into the game more quickly.
Ultimately, the best move for White will depend on their overall strategy and the specific plans they have for the rest of the game.
This answer is long-winded, has nice words & structure and indicates that it understood the question including the FEN string, but it is just wrong. Some of the suggestions are even impossible, e.g., "Attacking Black's pawn n e7" is impossible as the e-pawn has already moved and pawns cannot move backwards or sideways in chess. So, in the position above, there is no way that a black pawn will ever be again on e7.
A person that really seeks information on a topic might take these responses and use them. They come across as specific, reasonable and well-structured. If a human gave this response to someone that does not know chess, that person would likely summarize the response as "competent". This is where the real danger of ChatGPT is. Not with chess, but serious topics: health, professions, careers, programming, safety etc.; ChatGPT will generate nicely looking responses that are in fact just factually wrong.
What's the takeaway from this?
When you use ChatGPT or read articles about it, always be reminded of ChatGPT's self-assessment above that it is "a language model" and nothing more. It was trained to generate written human language and return structures of such languages, but it does not understand the underlying meaning. It might generate fairly sophisticated responses, but that's it. Of course, people can rely on it to write words and sentences, but they have to be aware that they are still responsible for the meaning/content as ChatGPT cannot and will not check it.
Senior AI Cloud Solutions Architect at Microsoft: Leading AI Opportunities for Top UK Azure Accounts & Actively working on Generative AI and Computer Vision
9 个月https://github.com/microsoft/autogen/blob/main/notebook/agentchat_chess.ipynb Please play with this one as well.
Awesome! Thanks for Sharing! ?