AI in Education New Research - 19th April

AI in Education New Research - 19th April

There's still a lot of good research coming out - but my pile of papers I've chosen to ignore is growing too. I'm going to write a piece about what makes a research paper bad or lower value (beyond "you know it when you see it") which I'll share too, in case it helps others.

Featured Research

Large language models are able to downplay their cognitive abilities to fit the persona they simulate

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0298522

The study (from Ji?í Mili?ka and others at Charles University & Humboldt-Universit?t zu Berlin (Humboldt University of Berlin) ) examined how well large language models can mimic children's language and thinking. The researchers had these models simulate children aged one to six years, testing their language skills and ability to understand others' thoughts through specific tasks. The researchers found that, much like a child grows in thinking and speaking complexity, the LLMs also showed more sophisticated language use and thought process simulation as the age of the child they were mimicking increased.

This has interesting uses for education, where teachers and educational designers could simulate interacting with a child of a specific age to understand stages of development, or for cross-training (imagine a teacher who's only taught Year 6 having to build a lesson activity for a Year 3 class)

Best of the Rest

Developing evaluative judgement for a time of generative artificial intelligence

https://www.tandfonline.com/doi/full/10.1080/02602938.2024.2335321

Another research paper from Phillip Dawson , and others at the Centre for Research in Assessment and Digital Learning (CRADLE) at 澳大利亚迪肯大学 in Australia. They argue that it's important to teach students how to evaluate AI's work critically, especially now we see AI output regularly. This helps students distinguish good from poor AI content, think deeply, use AI ethically, and prepare for future jobs where AI is common (possible every job?). And the paper looks at how to develop and assess the skills in students. They argue that we need to ensure that students don't uncritically accept information generated by AI. And as we saw earlier in the year , you could definitely say the same about their teachers too.

I like this advice - it's the first paper I've seen with specific recommendations, rather than generalised 'the sky is falling' warnings

Prompting Large Language Models for Zero-shot Essay Scoring via Multi-trait Specialization

https://arxiv.org/abs/2404.04941

The paper, from Sanwoo Lee Yunfang Wu and others at 北京大学 , discusses improving automated essay scoring (AES) by using a method called Multi Trait Specialization (MTS). MTS uses an AI approach called zero-shot, which doesn’t need previously graded essays for training. It splits writing proficiency into separate traits and evaluates each one individually, combining these scores for the final assessment. This method showed better performance than traditional techniques in experiments, and close to the same performance as traditional supervised AI models (which need extensive training data). Although they also found that a small-sized LLM (Mistral-7B-Instruct) outperformed ChatGPT, it turns out they only tested it with ChatGPT3.5, not GPT4, so I'd expect performance to improve for ChatGPT as soon as they re-test with the latest version.

What this means is that we're finding ways to get LLMs to do essay scoring without having to train them in advance on lots of previous papers!

Working Alongside, Not Against, AI Writing Tools in the Composition Classroom: a Dialectical Retrospective

https://uen.pressbooks.pub/teachingandgenerativeai/chapter/working-alongside-not-against-ai-writing-tools-in-the-composition-classroom-a-dialectical-retrospective/

Okay, so now's the time I need to confess to something. I often find the academic writing style difficult to decode - in many cases, it seems to be written to show off the advanced language skills of the authors, rather than to communicate with the widest possible audience. And research papers are also really good at hiding the insightful and useful information very deep in the paper. Often the abstract talks about what the research was intended to do, not what it found. And you have to go 80% of the way through the paper before you discover what the research found out. It's almost as if the point of doing research is the doing itself, not the discovery. Anyway, now I've got that off my chest, the confession:        
I don't know what "dialectical retrospective" meant...         
I had to look it up, and it basically means considering both sides of what's happened. So in this case, it means let's look at both sides of integrating generative AI into students' writing.        

The paper, from Dan Frank and Jennifer K. Johnson at UC Santa Barbara , emphasises the use of AI as an aid, not a substitute, for student writing, and gives classroom examples of where students use them. They show the value of teaching students how to use AI transparently to support their learning, and thinking critically about its output. They show that students benefit from AI's ability to provide customised learning and feedback

What’s great about this paper is that they give really concrete examples of how they use gen AI with their students, and share activities and their outcomes

This is one of a series of articles in a new book "Teaching and generative AI " which is available online, and is open access. I first came across one paper, and then discovered the whole book. I've not yet had time to digest the whole book, but encourage you to take a look at the full list of articles.

GPT versus Resident Physicians — A Benchmark Based on Official Board Scores

https://ai.nejm.org/doi/pdf/10.1056/AIdbp2300192

These researchers from Tel Aviv and Washington, looked at how GPT-3.5 and GPT-4 performed in medical board exams in Israel, compared to human physicians. What they found was that GPT-4 reached the pass score for 4 out of 5 specialities. It outperformed the majority of resident physicians in psychiatry (it hit the 75th percentile), and performed comparably to human physicians in general surgery and internal medicine. They did find it lagged behind in paediatrics and obstetrics/gynaecology - partly because there were a lot of image-based questions, which they didn’t do as part of this study.

They also found GPT-4 significantly outperformed GPT 3.5 (no surprise, but important for everybody to remember and ask the question about versions when you're looking at projects or research results)


Okay, so following on from the news that AI is apparently good at everything, then let's bring it back down to earth with:

Evaluating General Vision-Language Models for Clinical Medicine

https://www.medrxiv.org/content/10.1101/2024.04.12.24305744v1

Although we know that AIs are very proficient at text-based medical problems, often outperforming doctors, this research, from 美国斯坦福大学 意大利罗马大学 & Harvard Medical School , found that the latest multimodal GPT-4V system is still much worse than humans and traditional AI models, at medical images, like endoscopy, x-rays and skin lesion images. And even worse still on images of darker skin tones. High single digit percentage accuracies across most categories, but alarmingly moderately successful (59% accuracy!) in answering image-based medical licensing exam questions.

So might soon be able to pass the exam, but nowhere near competence in the real world…yet

Fascinating insights, Ray Fleming! Mimicking children's language and cognition could revolutionize educational simulations. Looking forward to seeing the impact of this research in classrooms.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了