AI in Education Research Update 31st May
Ray Fleming
Global AI and Education Industry Leader | Extensive sales & marketing experience | AI solution strategist | Customer centred thinker | Speaker | Media | PR
Can AI Provide Useful Holistic Essay Scoring?
Instead of writing a summary, I'm going to quote from the excellent article about this paper in the Heching Report, but before I talk about this, it’s important to note that all of this research refers to ChatGPT3.5, not the newer, and much more capable, ChatGPT4. I'd expect the results to be even stronger with ChatGPT4
Tamara Tate , a researcher at 美国加州大学尔湾分校 and an associate director of the university’s Digital Learning Lab, is studying how teachers might use ChatGPT to improve writing instruction.
"Most recently, Tate and her seven-member research team, which includes writing expert Steve Graham at Arizona State University, compared how ChatGPT stacked up against humans in scoring 1,800 history and English essays written by middle and high school students. Tate said ChatGPT was “roughly speaking, probably as good as an average busy teacher” and “certainly as good as an overburdened below-average teacher.” But, she said, ChatGPT isn’t yet accurate enough to be used on a high-stakes test or on an essay that would affect a final grade in a class. Most remarkably, the researchers obtained these fairly decent essay scores from ChatGPT without training it first with sample essays. That means it is possible for any teacher to use it to grade any essay instantly with minimal expense and effort. Writing instruction could ultimately suffer, Tate warned, if teachers delegate too much grading to ChatGPT. Seeing students’ incremental progress and common mistakes remain important for deciding what to teach next, she said. For example, seeing loads of run-on sentences in your students’ papers might prompt a lesson on how to break them up. But if you don’t see them, you might not think to teach it. In the study, Tate and her research team calculated that ChatGPT’s essay scores were in “fair” to “moderate” agreement with those of well-trained human evaluators. In one batch of 943 essays, ChatGPT was within a point of the human grader 89 percent of the time. On a six-point grading scale that researchers used in the study, ChatGPT often gave an essay a 2 when an expert human evaluator thought it was really a 1. But this level of agreement – within one point – dropped to 83 percent of the time in another batch of 344 English papers and slid even farther to 76 percent of the time in a third batch of 493 history essays.? That means there were more instances where ChatGPT gave an essay a 4, for example, when a teacher marked it a 6. And that’s why Tate says these ChatGPT grades should only be used for low-stakes purposes in a classroom, such as a preliminary grade on a first draft.
Tate set up ChatGPT for a tough challenge, competing against teachers and experts with PhDs who had received three hours of training in how to properly evaluate essays. “Teachers generally receive very little training in secondary school writing and they’re not going to be this accurate,” said Tate. “This is a gold-standard human evaluator we have here.”
The raters had been paid to score these 1,800 essays as part of three earlier studies on student writing. Researchers fed these same student essays – ungraded –? into ChatGPT and asked ChatGPT to score them cold. ChatGPT hadn’t been given any graded examples to calibrate its scores. All the researchers did was copy and paste an excerpt of the same scoring guidelines that the humans used, called a grading rubric, into ChatGPT and told it to “pretend” it was a teacher and score the essays on a scale of 1 to 6.
Earlier versions of automated essay graders have had higher rates of accuracy. But they were expensive and time-consuming to create because scientists had to train the computer with hundreds of human-graded essays for each essay question. That’s economically feasible only in limited situations, such as for a standardized test, where thousands of students answer the same essay question."
Best of the Rest
The Future of Feedback: Integrating Peer and Generative AI Reviews to Support Student Work
This paper, from researchers Akash Kumar Saini William Cope Mary Kalantzis at the University of Illinois, explores how combining peer feedback with AI-generated reviews can improve student learning. The study looks at graduate students' opinions on the quality, usefulness, and actionability of feedback from both peers and AI. Results show that students generally rated peer feedback slightly higher than AI feedback across all dimensions. However, each feedback type has unique strengths and weaknesses.
领英推荐
People receiving feedback tended to judge the AI feedback in line with their overall views on AI - feel positive about AI? You'll feel positive about the feedback it gives. Feel negative? You'll feel the same about it's feedback. Overall, the best results come from combining both types - peer and AI feedback.
Is ChatGPT Transforming Academics' Writing Style?
This paper, from a pair of researchers from Italy, actually doesn't look at overall writing, but specifically whether they believe ChatGPT has been used to produce or improve the abstract of papers. They conclude that it's different in different research areas, and the most likely to use it are Computer Science researchers. They claim that 35% of paper abstracts have been written with some help from ChatGPT. To be honest, this isn't a surprise to me, and I'd love it to go higher if the prompts that researchers used included the words "please make clear in the abstract what the valuable findings from the research are and why it matters"
"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment
A chorus of researchers - Charlotte Kobiella Yarhy Flores Franz Xaver W. at the Center for Digital Technology and Management (CDTM) , Fiona Draxler at Universit?t Mannheim , and Albrecht Schmidt at Ludwig-Maximilians-Universit?t München - worked on this paper!
This research on the impact of ChatGPT on young professionals can provide valuable insights for teachers, particularly regarding how AI might shape their students' perceptions of their future careers and work habits. Some key points they found:
What can you do? Beyond helping students understand the technology, you can also help students find a balance between leveraging AI for efficiency and contributing their own insights and creativity. And, given the fears of "robots coming to take our jobs", highlight the importance of their unique human perspective in the final output!?
Best Practices for Using AI When Writing Scientific Manuscripts
This is from November last year, and is a really handy guide for researchers on ideas, and limitations, for using ChatGPT in scientific writing. While AI can help with initial drafts, making analogies, improving structure, and aiding non-native English speakers, it also has significant limitations including the inability to understand new information, generate deep insights, or provide critical analysis. Over-reliance on AI could result in superficial and unoriginal research. The authors emphasise the need for caution, recommending that AI be used as a supplementary tool rather than a replacement for human creativity and critical thinking in scientific research.
CEO at Art of Smart Education | Pioneer of 5x Australian Firsts in Education | Mission to Empower 1 Million Students
6 个月Leon Furze have you seen the research on the accuracy of AI marking? What are your thoughts? I think an interesting counterpoint to your recent post on the inaccuracy of AI grading/marking, especially given that the study was done using GPT3.5.