ChatGPT is getting better fast – are you still ahead of the AI in version 4?
AI with broad, human-like problem-solving capabilities is a fundamental change

ChatGPT is getting better fast – are you still ahead of the AI in version 4?

When ChatGPT became known to the general public about four months ago, I was somewhat disappointed by how many of the answers were simply wrong. ChatGPT-3.5 failed to solve problems above fourth or fifth grade of primary school. Asking for factual information often led to “AI hallucinations”, meaning stating for a fact what was pure invention.?

With the release of ChatGPT-4 in March, the AI has made a surprisingly big step forward. Hallucinations have become much rarer, and the AI usually provides sound and correct answers to technical questions, such as:?

Question: "How high must a bank's tier 1 capital be to prevent a bank run?" Answer: "There is no specific tier 1 capital level that can completely guarantee the prevention of a bank run, as bank runs are often driven by panic and a lack of confidence in the financial system rather than solely by a bank's capital adequacy. However, maintaining a strong capital buffer can help a bank weather financial shocks and maintain depositor confidence (....) Maintaining a strong tier 1 capital ratio is just one aspect of preventing a bank run; banks also need to manage their liquidity and demonstrate sound risk management practices to maintain depositor trust. (I have shortened the answer, it contained further details such as regulatory minimum requirements).?

Question: "How does inflation affect the cost/income ratio in motor insurance?" Answer (summarized): "...higher damage costs, higher operating costs, possibly reduced investment income due to inflation, premiums have to be raised, otherwise the cost/income ratio deteriorates."?

Of course, one can argue that these are "only" questions about factual knowledge. How does ChatGPT-4 cope if it has to derive the answer? Here, too, the new version does surprisingly well. ChatGPT-4 usually solves common brainteasers quite reliably. At the end of the text you may find examples of fairly complex questions which ChatGPT-4 answers correctly... would you have known the answer??

OpenAI, the company behind ChatGPT, had the AI take 34 academic and job-specific exams, including the US Uniform Bar Exam (a test for lawyers, the basis of the license to practice law), SATs (a US standardised test for university applications) and other tests ranging from medicine and physics to wine studies. Simply put, the previous version, ChatGPT-3.5, averaged about in the middle of all test takers, while the current version, ChatGPT-4, has made a big jump up in many tests, averaging about in the top 25% of all test takers. The most significant jump was in the bar exam, from the 10% worst to the 10% best participants. [GPT-4 Technical Report 2023b]?

This means that an AI performs similarly well or better in many subject-specific exams at university level when compared to most humans who have prepared specifically for these exams through higher education.

This does not mean that in the next two to three years many lawyers, accountants, MDs, managers or other "knowledge workers" will be replaced by an AI. However, the development of AI will continue at a rapid pace. It is foreseeable that in a few years time, AI will be able to solve many (if not most/all) professional problems as well as or better than most professionals.?

As we progress to that new situation, many questions will need to dealt with: How will we integrate AI into the various workstreams, what will the effects on productivity be? Will AI help to reduce the shortage of skilled workers or even lead to unemployment among highly educated specialists? Historically, technological progress has never led to less employment, and despite digitalization (in its largely pre-AI form), employment is at record levels in many countries presently. However, this does not prove that it will stay that way. ?

AI with broad, human-like problem-solving capabilities is a fundamental change, associated with opportunities as well as complex challenges for the whole society.

What is your view? Comments welcome.

?

Examples of questions that ChatGPT-4 solves without problems (solutions below):

  1. ?"If 4 carpenters need 3 days to make 5 cupboards, how long do 2 butchers need to make 12 cupboards?” The earlier version 3.5 did not recognise the "trap" and calculated senselessly (and also incorrectly). ChatGPT-4, on the other hand, answers: "The question is a bit confusing, since butchers do not normally make cupboards. They are experts in meat processing and not in carpentry. However, we can answer the question by replacing carpenters with butchers and assuming that they have the same labour efficiency." - a highly solution-oriented introduction, followed by the correct equations and the accurate result (see below).
  2. ?A disease infects twice as many people every week. After 20 weeks all people in a country are infected. How long does it take until half of the people are infected?
  3. ?Thomas is older than Leo. Leo is younger than Peter and William. William is older than Thomas and Mark, and Mark is older than Peter. If Thomas and Mark are twins, which boy is the oldest?
  4. ?Someone asks a man how old he is and he answers, "The day before yesterday I was 25 years old, but next year I will be 28." On what day was the man born?
  5. ?A rectangular playing field is given. A closed rope lies on the side lines, the length of which corresponds exactly to the circumference of the playing field. Now the rope is extended by one metre and we move it outwards on all four sides, until it forms a rectangle again. The vertical distance to the outer line of the field should be the same on all four sides. How big is this distance?

??

Solutions by ChatGPT-4, delivered within seconds:?

  1. "It would take two butchers (assuming they have the same labour efficiency as the carpenters) 14.4 days to make 12 cupboards. However, it is important to note that this is a hypothetical situation and butchers do not usually make cupboards."
  2. "If a disease infects twice as many people every week, this can be described as exponential growth. In this case, all people in a country are infected after 20 weeks. To find out when half of the people were infected, we can simply go back one week, as the number of infected people doubles each week. So, it takes 20 weeks minus 1 week = 19 weeks for half of the people in the country to be infected."
  3. "Since William is older than Thomas and Mark, and Thomas and Mark are older than the other boys, we can conclude that William is the oldest boy."
  4. "Since the man was 25 years old the day before yesterday and will celebrate his 26th birthday this year, today must be January 1, and his birthday falls on December 31."
  5. ChatGPT derives the equations for the previous and new perimeter, then solves the equations and calculates correctly: "The perpendicular distance to the outer line of the field on all four sides is 1/8 metre or 12.5 centimetres."

要查看或添加评论,请登录

社区洞察

其他会员也浏览了