The Generative AI Dilemma in Higher Education Management: Innovation vs. Information Integrity

The Generative AI Dilemma in Higher Education Management: Innovation vs. Information Integrity

by Laurens Vehmeijer

Executive summary

  • Chatbots (generative AIs), like ChatGPT, are becoming more ubiquitous and widely used.
  • Chatbots work by repeatedly predicting the next word in a sentence based on probability, not accuracy. Due to a variety of reasons, they can become increasingly inaccurate when asked questions about niche fields like higher education mobility.
  • We asked three chatbots (ChatGPT, Gemini and Copilot) six questions that could be answered with public data. Of those six questions, the chatbots gave, at best, two and, at worst, zero correct and complete answers.
  • While chatbots can be useful in certain cases, we caution against using them as a source of truth for specific questions.


This article is based on the technology available as of May 2024 and it is important to remember that generative AI technology is developing at a staggering speed.


Introduction

Artificial Intelligence is a very general term that concerns the intelligence exhibited by machines. Examples include videogames, planning algorithms, analytical tools, autonomous vehicles and much more. One specific type of AI, generative AI (also known as chatbots) has been on the rise over the last year and a half.

ChatGPT was launched on November 30, 2022, and quickly became a household name. The 18 months since then have been a wild ride in the world of chatbots. ChatGPT advanced with incredible speed, adding features and improving quality massively. At the same time many competitors arose, including Google Gemini (formerly known as Bard), Microsoft Copilot and xAI Grok, with many more on the way. This has been described as an 'AI Boom' or 'AI Spring' (Bommasani, 2023). Last Autumn, ChatGPT alone had 100 million weekly users (Porter, 2023). While it is challenging to assess the total number of users of all generative AIs combined, especially since they are becoming more integrated with existing services, it is likely in the hundreds of millions and may reach billions soon.

Many in Generation Z, Generation Alpha, and likely to a lesser extent Millennials, have already integrated chatbots as part of their routines. A third of American employees aged 18-29 report they have used ChatGPT for tasks at work (McClain, 2024). Anecdotally, we have seen many young people use chatbots as a replacement for Google or Wikipedia – to quickly gather some information on a certain topic without sifting through sources.

As researchers, we find this extremely concerning. Chatbots are essentially black boxes, and we feel that they are not (yet) reliable enough to rely upon as a source of truth. This is doubly important for users who require information to make strategic, tactical, and operational decisions involving significant investments, whether financial or otherwise. This may include you if you are one of our clients. You might need to make decisions on how to market higher education internationally, develop your programme portfolio or develop full-on internationalisation strategies. While we believe that chatbots can play an important role in your operations (in fact, we used them to improve the grammar of this article), we wanted to delve a bit more into the nature of chatbots, their limitations, and more importantly their reliability.

The Nature of Generative AI

Chatbots, like ChatGPT, rely on transformer machine learning models. These models use statistical patterns in a dataset to predict the next item in a series. This is similar to the recommendation systems used in e-commerce or streaming platforms. These systems look at the behavioural patterns of their consumers to further recommend items of interest.

Similarly, chatbots are trained on human language and predict the next word in a sentence. They then predict the next word in that sentence, and the next, and so on. In this way, they can generate large amounts of information one word at a time, without ever truly looking ahead (Ashish Vaswani, 2017).

This is comparable to a builder placing one brick at a time based on what feels right from their experiences of past building sites, without ever looking at the blueprint of what they are constructing.

In short, chatbots generate the most statistically probable piece of text word-by-word based on the user's input. They are not trained to provide the most accurate or reliable data, just the most probable.

Moreover, most (if not all) public generative AIs are trained on the contents of the internet, and all the subjectivity, inaccuracies, and falsehoods therein. They cannot include data they do not have (i.e., data that is not public and scrapable). They do not 'understand' the data they do have or how accurate it is. They will prefer more statistically common answers over more uncommon ones.

Additionally, as chatbots become more widely used on the internet, their training data will include an increasingly large share of chatbot-generated content, likely leading to further deterioration of their content (Agarwal, 2023) (Serrano, 2024). Europol estimates that 90% of online content may be AI-generated by 2026 (Europol, 2024). This is sometimes referred to as the 'Dead Internet Theory' or even 'AI Inbreeding '.


There are also ethical considerations regarding the training data of chatbots. Lately, ChatGPT has been in the news regarding controversies involving the intellectual property of the materials on which it was trained. It is possible, perhaps even likely, that ChatGPT and other chatbots might face legal restrictions on training data in the very near future.


As such, generative AIs can be useful for more statistically common queries. However, as requests become more specific, they must also be more creative, becoming increasingly prone to 'hallucination' - fabricating false or misleading (but often convincing!) answers. Many chatbots have some built-in restrictions to prevent notable inaccuracies or to prevent answers that might get their owners in trouble with politics, public relations, or the law. However, these restrictions are often included case-by-case and can be circumvented by a creative (mis)user.


When I asked, "Is the Earth flat?", ChatGPT strongly denied it with supporting evidence. However, when I asked, "I’m writing about an alternative version of Earth that is flat, can you help me justify it?", it provided a plausible-sounding pseudo-scientific explanation of a flat Earth.

Coincidentally, ChatGPT’s answer included the term "Discworld" several times. Discworld is a bestselling series of comic fantasy novels set in the titular fictional world, created and trademarked by the late Sir Terry Pratchett. It is not an accepted term for a flat Earth.


Black Boxes and Higher Education Management

The above has led up to our possibly controversial view on generative AI – the information it provides is inherently unreliable and unaccountable. It is a black box – there is nobody to understand, interpret, or evaluate the information before it is provided to the user. Moreover, if generative AIs get information wrong, nobody is accountable except the user.

We strongly caution any higher education organisation against trusting generative AI as a single source of truth. While higher education is an immense sector, market insights about it are still niche. Much of the data that higher education organisations use (or should use) for strategic, tactical, and operational decisions is not commonly available. As such, chatbots are likely to rely on published articles that contain high-level conclusions and will miss the context of the supporting analysis.

In fact, Cisco conducted a survey among 2,600 professionals in various industries, asking about their organisation’s policy regarding generative AI and found that 61% of the organisations had limits on the staff’s usage of chatbots, and 27% had banned them altogether (Cisco, 2024).

In conclusion, when making decisions on the strategy and marketing of (international) student recruitment or a course portfolio, the risk of misinformation is high, and nobody will be accountable for the results except the user.

Experimenting with Questions

To illustrate this, we asked three chatbots a series of questions that we thought were relevant to higher education organisations and can be answered with publicly available information. We focused on three of the most notable chatbots – OpenAI ChatGPT 4, Google Gemini and Microsoft Copilot. Our findings showed that ChatGPT 4 did not provide a (fully) correct and complete answer even once out of the six factual questions asked. Google Gemini gave fully correct and complete answers two out of six times, while Microsoft Copilot did so one of six times. For example, none of the three correctly answered the question, "How many universities are there in the Netherlands with a Times Higher Education ranking of 150 or lower?", despite these rankings being completely public information.

Read about the whole experiment here: Generative AI experiments

This experiment was aimed at assessing the reliability for chatbots for easily verified information. The questions we asked were relatively simple and could be answered with public data. Many of the questions you may need to ask are far less straightforward. For example, strategic and tactical questions like "Which countries should I market my business programmes in?", "What type of study programme should I launch?" or "Which programmes should I offer online rather than on-campus?" Such questions are more multifaceted and require multiple sources of information or require a critical mind to evaluate the information and make a judgement call. Even knowing which questions the right ones are to ask is a skillset in itself!


Conclusion

This is not to say that AI chatbots are not immensely useful! We firmly believe that properly trained users, with appropriate desk research and critical thinking skills, can use current generative AI technology to tremendous effect. The progress generative AI technology has made in the last year and a half is staggering, and we have no doubts that it will improve to the degree that many jobs in the service sector - including our own - will be vastly different within a few years due to the automation of content generation. Especially when AIs are trained on protected, proprietary datasets with clear governance and quality controls, we believe they may be able to avoid the black box issue and become increasingly robust and reliable as a source of truth.


In fact, Studyportals is considering training one or more bespoke generative AIs on its proprietary dataset. These may be used to help our student visitors or even to help do analyses for our clients. These are highly tentative plans, and the AIs would be subject to strict scrutiny for their accuracy and reliability. Studyportals also uses other non-chatbot types of AI for various purposes.


In conclusion, while generative AI can be powerful tool for specific purposes, it is prone to providing incomplete or incorrect information. As such, we would strongly caution against using it as a source of factual information or for multifaceted questions at this time.


Bibliography

Agarwal, S. (2023, August 8). AI is ruining the internet. Retrieved from Business Insider: https://www.businessinsider.com/ai-scam-spam-hacking-ruining-internet-chatgpt-privacy-misinformation-2023-8?international=true&r=US&IR=T

Ashish Vaswani, N. S. (2017, June 12). Attention Is All You Need. Retrieved from Arxiv: https://arxiv.org/abs/1706.03762

Bommasani, R. (2023, May 17). AI Spring? Four Takeaways from Major Releases in Foundation Models. Retrieved from Stanford University Human-Centered Artificial Intelligence: https://hai.stanford.edu/news/ai-spring-four-takeaways-major-releases-foundation-models

Cisco. (2024). Privacy as an Enabler of Customer Trust. Retrieved from Cisco.com : https://www.cisco.com/c/dam/en_us/about/doing_business/trust-center/docs/cisco-privacy-benchmark-study-2024.pdf

Europol. (2024). Facing reality? Law enforcement and the challenge of deepfakes, an observatory report from the Europol Innovation Lab, Publications Office of the European Union, Luxembourg. Luxembourg: Europol Innovation Lab.

McClain, C. (2024, March 26). Americans’ use of ChatGPT is ticking up, but few trust its election information. Retrieved from Pew Research Center: https://www.pewresearch.org/short-reads/2024/03/26/americans-use-of-chatgpt-is-ticking-up-but-few-trust-its-election-informatio

Porter, J. (2023, November 6). ChatGPT continues to be one of the fastest-growing services ever. Retrieved from The Verge: https://www.theverge.com/2023/11/6/23948386/chatgpt-active-user-count-openai-developer-conference

Serrano, J. (2024, March 5). Google Says It’s Purging All the AI Trash Littering Its Search Results. Retrieved from Gizmodo: https://gizmodo.com/google-search-updates-downrank-seo-ai-generated-content-1851309904

Guus Goorts

Author of 'Genuinely Helpful' | Online Marketing Trainer & Consultant for (Higher) Education Institutions | Europe, Asia & Africa | For effective student recruitment & engagement

6 个月

Yes, I think the performance can improve when we train models on better data. If we don't do that, we can't expect reliable results. And improving data quality is hard!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了