What the generative AI research papers told us last week - 6th November 2023
Ray Fleming
Global AI and Education Industry Leader | Extensive sales & marketing experience | AI solution strategist | Customer centred thinker | Speaker | Media | PR
I keep a running note of interesting research for educators on the topic of generative AI - things like what we're finding out about the use of ChatGPT, Large Language Models etc - and what it means for education.
For the week commencing 6th November, we discovered that:
Gender Bias
The study investigates the transfer of gender bias from Large Language Models (LLMs) to student writing in an educational setting. Conducting a large-scale user study with 231 students writing peer reviews in German, the research examines the influence of AI-generated feedback on the manifestation of gender bias in student writing.
Quote: "Contrary to concerns, the results revealed no significant difference in gender bias between the writings of the AI-assisted groups and those without AI support. These findings are pivotal as they suggest that LLMs can be employed in educational settings to aid writing without necessarily transferring biases to student work"
Paper available on arXiv Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance
Tutor Feedback tool?
Summary of the Research: This paper presents two longitudinal studies assessing the impact of AI-generated feedback on English as a New Language (ENL) learners' writing. The first study compared the learning outcomes of students receiving feedback from ChatGPT with those receiving human tutor feedback, finding no significant difference in outcomes. The second study explored ENL students' preferences between AI and human feedback, revealing a nearly even split. The research suggests that AI-generated feedback can be incorporated into ENL writing assessment without detriment to learning outcomes, recommending a blended approach to capitalize on the strengths of both AI and human feedback.
Paper from Educational Technology Journal: AI-generated feedback on writing: insights into efficacy and ENL student preference
领英推荐
Personalised feedback in medical learning
Summary of the Research: The study examined the efficacy of ChatGPT in delivering formative feedback within a collaborative learning workshop for health professionals. The AI was integrated into a professional development course to assist in formulating digital health evaluation plans. Feedback from ChatGPT was considered valuable by 84% of participants, enhancing the learning experience and group interaction. Despite some participants preferring human feedback, the study underscores the potential of AI in educational settings, especially where personalized attention is limited.
Paper on arXiv: “Close...but not as good as an educator” - Using ChatGPT to provide formative feedback in large-class collaborative learning
High Stakes answers
Your Mum was right all along - ask nicely if you want things! And, in the case of ChatGPT, tell it your boss/Mum/sister is relying on your for the right answer!
?Summary of the Research: This paper explores the potential of Large Language Models (LLMs) to comprehend and be augmented by emotional stimuli. Through a series of automatic and human-involved experiments across 45 tasks, the study assesses the performance of various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4. The concept of "EmotionPrompt," which integrates emotional cues into standard prompts, is introduced and shown to significantly improve LLM performance. For instance, the inclusion of emotional stimuli led to an 8.00% relative performance improvement in Instruction Induction and a 115% increase in BIG-Bench tasks. The human study further confirmed a 10.9% average enhancement in generative tasks, validating the efficacy of emotional prompts in improving the quality of LLM outputs.
"Everyone Knows Claude Doesnt Show Up on AI Detectors"
Not a paper, but an article from an Academic
The article discusses an experiment conducted to test AI detectors' ability to identify content generated by AI writing tools. The author used different AI writers, including ChatGPT, Bard, Bing, and Claude, to write essays which were then checked for plagiarism and AI content using Turnitin. The tests revealed that while other AIs were detected, Claude's submissions consistently bypassed the AI detectors.
Article on Michelle's substack: Everyone Knows Claude Doesnt Show Up on AI Detectors