Research Update - Is ChatGPT ?????
Ray Fleming
Global Education Industry Leader | Extensive sales & marketing experience | Edtech solution guru | Customer centred thinker | Speaker | Media | PR
Honestly, I never expected it come to this, but the Research Paper of the Week doesn't even ask the question - it asserts that it is. But it's not quite what you're thinking, so read on. And then continue to the rest of the academic research papers that I've tracked down in the last two weeks
Research Paper of the Week
ChatGPT is bullshit
The paper "ChatGPT is Bullshit", by Michael Townsen Hicks , Joe Slater and James Humphries, explores the phenomenon of inaccuracies in the output of large language models (LLMs) like ChatGPT. They argue against describing these inaccuracies as "hallucinations" or "lies" and instead propose that these outputs should be understood as "bullshit" - in the philosophical sense described by Harry Frankfurt. Frankfurt defines bullshit as speech or writing produced without concern for the truth, rather than an intention to deceive.
As they say in the paper: "Calling chatbot inaccuracies ‘hallucinations’ feeds into overblown hype about their abilities among technology cheerleaders, and could lead to unnecessary consternation among the general public."
The paper distinguishes between "soft bullshit," which lacks concern for truth, and "hard bullshit," which involves misleading about the intention behind the utterance. The authors conclude that ChatGPT primarily produces soft bullshit, with potential to produce hard bullshit under certain interpretations.
Also quoting from the paper: "We will argue that even if ChatGPT is not, itself, a hard bullshitter, it is nonetheless a bullshit machine."
The reason that this is important research is that they argue that as long as we say things like 'hallucination' instead of 'bullshit', we could be making poor policy decisions about regulating AI that don't adequately address the real problems.
The rest of the AI in Education Research
The Prompt Report: A Systematic Survey of Prompting Techniques
An absolute storm of authors from academic and companies wrote this paper, including Sander Schulhoff & Michael Ilie at 美国马里兰大学帕克分校 , Shyamal Hitesh Anadkat at OpenAI , Jules White at Vanderbilt University , Chenglei S. at 美国斯坦福大学 , Yinheng Li at 微软 , and Denis Peskoff at 美国普林斯顿大学 .
This research paper brings together a neat summary of prompting techniques for generative AI systems by creating a detailed taxonomy of 58 text-only prompting techniques and 40 multimodal techniques, and 33 key vocabulary terms (like 'zero-shot prompting' and 'few-shot prompting'). The authors looked at what all the existing literature said, and then categorised different types of prompts used to guide AI outputs. They also explored the application of these techniques in various contexts, including multilingual and multimodal settings, and highlighted the importance of prompt engineering and evaluation. This is a really good structured resource for educators, researchers and edtech developers who want to enhance the effectiveness of their AI prompts. A prompt can be as simple as "Translate the word cheese to french" or "recommend a good book for me to read", but then be extended by "I enjoy Shakespeare and Lee Child, recommend another author". Oh, which gives us two zero-shot prompts, and a few-shot prompt. Or things like 'role based' prompts, like "Pretend you are a shepherd and write a limerick about llamas.". This paper was written by a heap of authors, and it didn't surprise me to find names like Prof Jules White, from Vanderbilt University in the list - he's the creator of some great ChatGPT Prompting courses.
My recommendation on this paper is to put it into your reading list, or print a copy for your bookshelf, and take a look at it every few months, or when you're getting frustrated with the results from a large language model, as it will help you to unlock new ways of using it. You may not understand it all yet, but over time you'll find it more and more useful.
Jill Watson: A Virtual Teaching Assistant powered by ChatGPT
This paper, from Karan Taneja Pratyusha Maiti Sandeep Kakar, PhD Pranav Guruprasad Ashok Goel at Georgia Institute of Technology , looks at their own Virtual Teaching Assistant, using ChatGPT, called Jill Watson. Using ChatGPT, it helps answer student queries based on course materials like syllabi, notes, and textbooks without needing prior training - but can be provided with curriculum documents etc to? help answer questions from students. The system employs a modular, skill-based design inspired by XiaoIce, enabling integration with new APIs and ensuring safety through various filters. The researchers compared their chatbot, to their previous version and to OpenAI's Assistant service. Their finding was that outperforms them in both response quality and safety, reducing hallucinations and toxicity. Real-world classroom examples demonstrate its effectiveness in providing relevant and accurate answers to students' questions.
It's a good template, because the system includes various measures to prevent the generation of inappropriate or incorrect information. For instance, it employs classifiers to ensure questions are relevant and filters to block toxic content and AI cites the sources of its information to reduce hallucination problems.
I don't know if there is a little bit of 'marking your own homework' here, as the researchers developed the chatbot as well as conducting the research, but it's still a handy reference for ideas on building safe and helpful tutors
Delving into ChatGPT usage in academic writing through excess vocabulary
This research with a clever name ('Delving' ??), out of Eberhard Karls Universit?t Tübingen and 美国西北大学 , examines the impact of large language models (LLMs) like ChatGPT on scientific writing, revealing that at least 10% of 2024 PubMed abstracts were processed with LLMs. The study highlights significant increases in the use of specific style words, indicating widespread adoption of LLMs across various academic fields and countries. Their method was the same as some research earlier in the year - looking over time for increased frequency of key words? that ChatGPT loves - like Delve! - and finding a leap in usage after the release of ChatGPT. They analysed 14 million abstracts for this, so it's very comprehensive.
Ten Myths about Artificial Intelligence in Education
This research article by Louie Giray in the journal 'Higher Learning Research Communications', challenges and dispels prevalent misconceptions about AI's role in education. The paper argues that while AI has significant potential to enhance educational practices, it cannot replace the essential human elements provided by educators and physical classrooms. Key myths addressed include the beliefs that AI will replace teachers, render classrooms obsolete, and is smarter than humans. The paper underscores AI's limitations, such as its inability to exhibit empathy, creativity, and holistic understanding. By debunking these myths, the paper advocates for a balanced and ethical approach to integrating AI in education, ensuring it supports rather than supplants human educators.
The 10 myths are:
A review on the use of large language models as virtual tutors
领英推荐
This paper, from Silvia García Méndez & Francisco de Arriba Pérez , looked at what's been published about the use of LLMs in education to identify the most popular academic applications. They concluded that LLMs are predominantly used as virtual tutors, question generation, answer grading, and code explanation. Probably the main use of this paper for you might be to find other more detailed papers on - and include a summary of the papers where teachers or students were involved in the research.
Large Language Models as Partners in Student Essay Evaluation
This research, from Toru Ishida William Cheung Tongxi Liu at 香港浸会大学 , explores the role of Large Language Models (LLMs) in evaluating student essays in workshop courses. By comparing LLM evaluations to those of faculty members, the study examines three scenarios: LLMs without guidance, with pre-specified rubrics, and through pairwise comparison of essays. Their results show that:
1) LLMs can match the assessment capabilities of faculty members
2) variations in LLM assessments should be interpreted as diversity rather than confusion
3) assessments by humans and LLMs can differ and complement each other
The study concludes that LLMs should be considered partners in the evaluation process, capable of providing valuable insights and complementing human assessments.
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
The paper, from researchers at 香港城市大学 and MBZUAI (Mohamed bin Zayed University of Artificial Intelligence) , introduces "Grade-Like-a-Human," a multi-agent system designed to enhance automated grading using Large Language Models (LLMs). Traditional automated grading systems often struggle with accuracy and consistency, particularly for complex questions. This new approach divides the grading process into three stages: rubric generation, grading, and post-grading review. By incorporating student answers into the rubric design and using iterative sampling methods, the system refines grading criteria to be more effective. The study evaluates the system using a newly collected dataset from an undergraduate Operating Systems course and the widely used Mohler dataset. Results indicate significant improvements in grading performance, particularly for complex questions, suggesting that this systematic approach can better emulate human grading processes. A nice feature of this paper is that they share the design of their system, so that others can learn from it, and the detailed results from each of their methods of prompting the LLM
Designing Prompt Analytics Dashboards to Analyze Student-ChatGPT Interactions in EFL Writing
As we know, ChatGPT into education offers significant opportunities and challenges, particularly in language learning. This research paper, from a huge team at KAIST , addresses these by developing a Prompt Analytics Dashboard (PAD) designed to help teachers analyse student interactions with ChatGPT in English as a Foreign Language (EFL) writing classes. The study involved a two-phase iterative design process, including surveys, interviews, and prototype testing with six university English teachers, who integrated ChatGPT into semester-long English essay writing classes.
During the first phase of the study, teachers expressed concerns about ChatGPT's potential to undermine critical thinking by encouraging students to use AI-generated content without understanding it. They also highlighted the additional workload involved in monitoring student interactions and ensuring the ethical use of AI. Teachers emphasized the importance of aligning prompts with learning objectives and the need for tools to assess prompt effectiveness.
In particular, all the teachers were seriously concerned about students’ behaviour of asking ChatGPT to write paragraphs or even an entire essay.
The PAD prototype was developed based on these insights, featuring various charts and analytic components to help teachers quickly grasp student progress, identify misuse of ChatGPT, and provide personalized feedback. The dashboard allows teachers to filter prompts by learning objectives, review essay editing histories alongside chat logs, and customize ChatGPT's behavior to better align with educational goals.
This is really insightful research because it shows a way for teachers to become more engaged with the way that their students are using ChatGPT, and integrating it more carefully into the learning journey, rather than an alternative hands-off approach. I think we're likely to see more of these kind of examples in the future, so this paper is good guide for other people considering it
An empirical study to understand how students use ChatGPT for writing essays and how it affects their ownership
The research, conducted by Andrew Jelson & Sang Won Lee at Virginia Tech , is about a future project to explore how students use ChatGPT for writing essays and its effect on their perceived ownership of their work. Through a user study, the researchers plan to track queries made to ChatGPT, the responses provided, and subsequent essay revisions. The aim is to analyse the patterns in AI usage and its impact on students' writing processes. They explain their methods (in case you want to do something similar), and include their contact details if you just can't wait for their next paper with the results!
Experiences from Integrating Large Language Model Chatbots into the Classroom
This research, from Arto Hellas at Aalto University and Leo Lepp?nen at University of Helsinki , investigated the use of a GPT-4-based chatbot, in late 2023, in three computer science courses at Aalto University, Finland. The chatbot was included in an LLM-focused course and two other courses unrelated to LLMs. The chatbot was integrated without any content restrictions. The researchers aimed to address three key questions: how the chatbot was used in relation to the course content, the perceived usefulness of the chatbot, and how prior experience with programming and LLMs influenced its use.
Results showed that while nearly all students in the LLM-focused course used the chatbot, only a minority of students in the other courses engaged with it. Even among those who did use it, a few "superusers" accounted for the majority of interactions. The study suggests that fears of students over-relying on such chatbots were unfounded. The findings indicate the potential benefits of developing more targeted and structured chatbot experiences to better support student learning
Generative AI in the Australian education system: An open data set of stakeholder recommendations and emerging analysis from a public inquiry
This paper from a panel of 澳大利亚悉尼科技大学 researchers - Simon Knight Camille Dickson-Deane Keith Heggart , Kirsty Kitto Dilek Cetindamar Damian Maher Bhuva Narayan Forooq Zarrabi - analyses the 95 submissions to a public inquiry by the Australian Federal Government in August 2023 into the use of Generative AI in Education. It discusses what people saw as the potential benefits of GenAI, such as enhanced learning tools and administrative efficiencies, as well as risks like academic integrity issues and data privacy concerns. And there's also quite a lot about how people thought AI needed to be regulated in education, especially around its links to assessment. It links to the open dataset of the submissions to the enquiry - so it's a great way to find out what people were thinking within 9 months of ChatGPT's release, and if you're a researcher, it's a good source of topics that educators thought needed more research (and, from many submissions, more research funding).
Senior Managing Director
2 个月Ray Fleming Fascinating read. Thank you for sharing
Professional education in Marketing, Strategy and Digital Technology
2 个月Thanks, Ray - really interesting. Personally I think people focus too much on ChatGPT. There are many other more practical AI tools which I think will supersede this (or ChatGPT will be embedded in other systems). e.g. Bing Co Pilot, Google Gemini, Google Circle to Search, AI image generators and so on. So I think it's important to teach students about the vast variety of ways in which AI is being deployed now. Keep up the good work with thought-provoking content !
Head of Digital Education | AI in Education | EdTech |
3 个月Ray Fleming These updates are great. Thanks for all your work in compiling them.
Professor of Computer Science, Georgia Institute of Technology; Executive Director, National AI Institute for Adult Learning and Online Education; Editor Emeritus, AAAI's AI Magazine
3 个月Thank you for including Jill Watson. This paper reports on deployment of Jill in real classes including preliminary results: https://link.springer.com/chapter/10.1007/978-3-031-63028-6_7
COO @ Learn Prompting | We've helped 3M+ people learn how to use ChatGPT, Generative AI, & Prompting, including teams at OpenAI, Microsoft, & Google!
3 个月Thanks for sharing our paper, Ray Fleming - this was led by our team at Learn Prompting! We’re glad it was helpful— our hope is that this becomes a foundational paper to centralize the definitions in prompting. There’s still much more to be discovered within LLMs - so hopefully other's can help build off our work to find other & more effective techniques. For now, we’ve incorporated the 58 most effective prompting techniques in the Prompting courses that we teach on the Learn Prompting platform. Our goal is that our users learn the latest techniques, how to write effective prompts, instead of relying on generic lists like "99 prompts to improve Marketing, Sales, etc"