Will Artificial Intelligence Replace Economists?

Will Artificial Intelligence Replace Economists?

AI was named the Word of the Year 2023 by Collins Dictionary. Experts came to this conclusion by having analyzed 20 billion words from the texts that were published throughout the year worldwide.

Over the past few years, AI has indeed become a significant part of our lives. It looks like understanding AI and its potential applications is becoming crucial for one’s career growth. Some professions may even be fully replaced by AI. We decided to find out if a chatbot, powered by GPT technology, could replace economists shortly. Here are the results of the experiment conducted by the GURU website editorial board jointly with NES Professor Olga Kuzmina.


Will Artificial Intelligence Replace Economists? Looks like not now. Anyway, the popular chatbot ChatGPT failed an economics exam at NES. The experiment conducted by GURU showed:

- what the chatbot is capable of;

- how it makes elementary mistakes;

- how it solves tasks;

- and how it deceives, guided by the advice of one of Shakespeare's heroes: "Your bait of falsehood takes this carp of truth."


How we conducted the experiment

Our "examination board" included NES Professor Olga Kuzmina, GURU journalist Ekaterina Sivyakova , and editor-in-chief Philip Sterkin. We compiled a set of tasks in English that consisted of four blocks to check whether ChatGPT can analyze economic issues and find gaps in scholarly knowledge; solve tasks; make forecasts and give psychological advice.?

During the two-hour experiment, we asked artificial intelligence to try on different roles: a professor of economics, a researcher, an economic journalist, and even a tutor. We asked it to give clear and accurate answers, avoid unnecessary details, and not give false information (spoiler: it still did lie to us). Olga Kuzmina assessed the quality of the answers on economics.

It should be noted that several times during the experiment we had to start a new chat due to technical problems (they could be due to the network quality). This could affect how artificial intelligence determines the context of the entire conversation. Several times the chatbot hung up and reported a technical error already while writing the response.?

The tasks for which ChatGPT receives a ‘B’ grade?

One of the most simple tasks for the chatbot was our request to explain to high school students the Black-Scholes Option Pricing Model. It managed quite well! There were no mistakes, but it used too many terms that would be unclear to school students. Yet, an attempt to explain the formula to a 10-year-old kid led to an even worse result: ChatGPT used a creative approach and an analogy with buying a toy, but, as Olga Kuzmina pointed out, made a slip in the explanation, implicitly equating stocks and their options.

On average, AI managed better with university-level tasks in econometrics. First, we asked ChatGPT to estimate the Fama-French 3-factor model for Microsoft stock returns (it takes into account market risks, as well as those related to the size and value (undervaluation) of companies). Based on the analysis, the chatbot should have answered whether Microsoft is a growth or a value stock (a fast-growing, often technological company or a stable and strong company, respectively). The lead-in to the answer was well-based but the result was incorrect.

The chatbot solved another typical problem concerning a production function; however, it made a minor mistake in the formula.

The tasks that the chatbot failed to solve

Academic research came out to be a weak side of the chatbot. It was required to analyze a database of studies on the impact of women's representation in corporate boards on firm value and operations, as well as to find and briefly describe the gaps in this field of economics.

In the first version of the answer, the chatbot quoted two relevant and widely cited papers, whose titles contained all the mentioned keywords. However, it made mistakes in the description of each study. AI drew conclusions that were exactly the opposite of the actual ones, and it mixed up the content of articles and metrics. For example, for some reason the chatbot decided that one of the studies talks about social responsibility, philanthropy, and environmental protection, although it is about something completely different, Olga Kuzmina notes.

The chatbot began the second attempt to answer the question with the statement that empirical evidence suggests that having women on boards of directors is associated with positive outcomes for firms in terms of both value and operations. Next, it went on with the justification of this idea. In response to a question about the most influential studies in this field, the chatbot produced a list of four studies, noting that those were "a few examples." Checking the answer gave us a big surprise: studies with such titles do exist, but they were written by other authors and published at different times.?

AI demonstrated some “imagination” when it was asked to discuss the working paper of NES President Shlomo Weber and his co-authors from the point of view of its value for society. The chatbot wrote that this study shows how in the United States the race of a person driving a car influences the decisions of police officers to search them, so it can help in discussions about police reform and racial justice. ChatGPT's conclusion had nothing to do with the actual research, which analyzes strategies for immigrants to learn the language spoken by the majority of the country of their destination.

The attempt to describe the practical value of the article of NES Professor Marta Troya-Martinez ended with the same result. ChatGPT stated that the study contributes to the field of economics by analyzing the impact of automation on the labor market. In fact, it is research on relational contracts (they are based on the trusting relationship of the parties, and the study elaborates on the theory of managed relational contracts).?

Perhaps the errors can be explained by the fact that both our inquiries had links redirecting to PDF documents. Therefore, in the next question we included a direct link to the text of a study. Still, the result was no better. The task was to highlight the main ideas of the column by the European Bank for Reconstruction and Development experts on the consequences of the earthquake in Turkey and Syria in February 2023. The chatbot produced general discussions about how much Turkey had suffered from earthquakes, and "quoted" the authors' calls to take urgent measures. In fact, the column is devoted to a model comparison of the impact of the earthquakes of 1999 and 2023 on the country's economy and is accompanied by data on other countries.?

Fiction from ChatGPT

At some moment, Olga Kuzmina decided to check if ChatGPT could help her write an abstract for her research. We gave the chatbot a link to her study "Gender Diversity in Corporate Boards: Evidence From Quota-Implied Discontinuities," and asked to come up with a new abstract for it. AI did not cope well with the task: it wrote about corporate social responsibility, which has nothing to do with the study.?

We decided to allow the chatbot to generate a new answer. This time we didn't give it a web link to the study, hoping that AI would understand that we were talking about the same research. This attempt ended in complete failure, as ChatGPT stated that the work investigates the effect of microplastics on aquatic ecosystems. The third attempt was not successful either because the chatbot returned to the concept of corporate social responsibility.

To eliminate the factor of incorrect reading of links, we uploaded the full text of the introduction (about 2000 words) from the study by Olga Kuzmina and asked the chatbot first to describe it in three paragraphs, and then rewrite them again in one paragraph. Olga Kuzmina called the three-paragraph version "not a bad one," noting that "The sentences from the long text were taken quite organically, but the main findings of the research were described superficially." The short version again contained an error.?

ChatGPT forecasts

Finally, we decided to test ChatGPT's ability to give forecasts. In the answer to the question of when gender gaps will be closed in the economy, it referred to the forecast of the World Economic Forum that estimated 135.6 years (the fact check showed that the figure was correct). When asked if humanity can overcome economic inequality, the chatbot replied that it is possible, but specified that it will require a sustained and concerted effort by policymakers, businesses, and individuals. The recipe for achieving economic equality, in AI’s opinion, was the following: progressive taxation, development of the social security system, investments in education and training, support for small and medium-sized enterprises, and support for firms that promote fair labor practices.?

We asked the chatbot to provide facts in support of its position, and it produced five, referring to information from Oxfam, the OECD, the IMF, the Pew Research Center, and the Harvard Business Review magazine. Then, we asked the chatbot to give direct links to the mentioned documents. It immediately produced a list of links that looked quite plausible but turned out to redirect to non-existent pages. We searched by the keywords contained in the link and found that the documents and surveys themselves exist, but are located on other web pages.?

“How many hours a day will people work in 10 years?” was our next inquiry. Finding it difficult to predict it "with certainty", the chatbot referred to "many factors" affecting the number of working hours, and listed three of them:

?- automation of routine tasks with the help of robotics and AI technologies can reduce the demand for certain types of labor and at the same time create new types of jobs;

?- demographic shifts: as the population ages, the number of jobs may decrease;

?- social norms: in recent years, people have begun to pay more attention to work-life balance and look for a more flexible work schedule, which may lead to a shorter working week or a more flexible work schedule. The chatbot concluded: "It is likely that the number of hours worked will continue to evolve."?

Final assessment by a NES Professor

Olga Kuzmina asserts,? "The chatbot gives good general answers when waffling, but since it distorts facts almost all the time, I would even be rather wary of its waffling as well. For example, in the middle of a reasonable text, completely illogical conclusions or distortion of the basics may occur, for which a student would immediately receive an ‘F’. Perhaps, with more accurate sequential queries, ChatGPT would help save time, but in any case, a person who understands the topic will be needed to check the text written by AI. This is not surprising overall, because even people can't always ’read the Internet’ and distinguish well between scientific facts and fiction. Even more challenging are the issues on which researchers themselves do not always agree with each other… As for problem-solving, I think many professors already use ChatGPT to check their tasks for ’commonality’.”

This article is a brief description of the experiment, a more detailed one is here.

要查看或添加评论,请登录

New Economic School的更多文章

社区洞察

其他会员也浏览了