What the generative AI research papers told us last week - 6th November 2023

What the generative AI research papers told us last week - 6th November 2023

I keep a running note of interesting research for educators on the topic of generative AI - things like what we're finding out about the use of ChatGPT, Large Language Models etc - and what it means for education.

For the week commencing 6th November, we discovered that:

  1. The gender bias that might exist in large language models doesn't appear to influence students' writing
  2. AI writing tutors for E2L learners appear to match human performance
  3. 84% of health professionals valued course feedback delivered by a ChatGPT bot
  4. You Mum was right - ask nicely to get what you want. Works with ChatGPT too!
  5. Just one more issue with AI Detectors - they don't appear to be able to spot text generated by Claude.

Gender Bias

The study investigates the transfer of gender bias from Large Language Models (LLMs) to student writing in an educational setting. Conducting a large-scale user study with 231 students writing peer reviews in German, the research examines the influence of AI-generated feedback on the manifestation of gender bias in student writing.

Quote: "Contrary to concerns, the results revealed no significant difference in gender bias between the writings of the AI-assisted groups and those without AI support. These findings are pivotal as they suggest that LLMs can be employed in educational settings to aid writing without necessarily transferring biases to student work"

Paper available on arXiv Unraveling Downstream Gender Bias from Large Language Models: A Study on AI Educational Writing Assistance

Tutor Feedback tool?

Summary of the Research: This paper presents two longitudinal studies assessing the impact of AI-generated feedback on English as a New Language (ENL) learners' writing. The first study compared the learning outcomes of students receiving feedback from ChatGPT with those receiving human tutor feedback, finding no significant difference in outcomes. The second study explored ENL students' preferences between AI and human feedback, revealing a nearly even split. The research suggests that AI-generated feedback can be incorporated into ENL writing assessment without detriment to learning outcomes, recommending a blended approach to capitalize on the strengths of both AI and human feedback.

Paper from Educational Technology Journal: AI-generated feedback on writing: insights into efficacy and ENL student preference

Personalised feedback in medical learning

Summary of the Research: The study examined the efficacy of ChatGPT in delivering formative feedback within a collaborative learning workshop for health professionals. The AI was integrated into a professional development course to assist in formulating digital health evaluation plans. Feedback from ChatGPT was considered valuable by 84% of participants, enhancing the learning experience and group interaction. Despite some participants preferring human feedback, the study underscores the potential of AI in educational settings, especially where personalized attention is limited.

Paper on arXiv: “Close...but not as good as an educator” - Using ChatGPT to provide formative feedback in large-class collaborative learning

High Stakes answers

Your Mum was right all along - ask nicely if you want things! And, in the case of ChatGPT, tell it your boss/Mum/sister is relying on your for the right answer!

?Summary of the Research: This paper explores the potential of Large Language Models (LLMs) to comprehend and be augmented by emotional stimuli. Through a series of automatic and human-involved experiments across 45 tasks, the study assesses the performance of various LLMs, including Flan-T5-Large, Vicuna, Llama 2, BLOOM, ChatGPT, and GPT-4. The concept of "EmotionPrompt," which integrates emotional cues into standard prompts, is introduced and shown to significantly improve LLM performance. For instance, the inclusion of emotional stimuli led to an 8.00% relative performance improvement in Instruction Induction and a 115% increase in BIG-Bench tasks. The human study further confirmed a 10.9% average enhancement in generative tasks, validating the efficacy of emotional prompts in improving the quality of LLM outputs.

Paper on arXiv: Large Language Models Understand and Can Be Enhanced by Emotional Stimuli

"Everyone Knows Claude Doesnt Show Up on AI Detectors"

Not a paper, but an article from an Academic

The article discusses an experiment conducted to test AI detectors' ability to identify content generated by AI writing tools. The author used different AI writers, including ChatGPT, Bard, Bing, and Claude, to write essays which were then checked for plagiarism and AI content using Turnitin. The tests revealed that while other AIs were detected, Claude's submissions consistently bypassed the AI detectors.

Article on Michelle's substack: Everyone Knows Claude Doesnt Show Up on AI Detectors


要查看或添加评论,请登录

Ray Fleming的更多文章

社区洞察

其他会员也浏览了