AI for Teacher Development: a study that combined video and audio to assess teachers

AI for Teacher Development: a study that combined video and audio to assess teachers

I came across a new study by Jinglei Yu and colleagues from Beijing Normal University called A Student-Teacher Multimodal Interaction Analysis System for Classroom Observation - my thanks to the authors for sharing a copy of the paper with me. In this study, the researchers carry out a preliminary analysis of a system that evaluates video of teachers using both video and audio information. The authors note that:

Although several studies have been conducted to automate the coding and analyzing process, they are either based on audio information or video information collected from the classroom, which fails to jointly utilize multimodal information like domain experts. We thus propose a student-teacher multimodal interaction analysis system that conducts the analysis using both video and audio information and accordingly generates the informative reports based on the analysis results.

This is fascinating, as previous studies I've seen have tended to focus mainly on transcripts of audio, not even the audio itself, which of course loses meaning including timings, pauses, tone, etc.

This paper is very new - online on the 30th of June, 2023. The authors reflect that in typical studies of classroom observation, each video needs to be hand coded by researchers to identify teacher actions and student-teacher interactions. They therefore design and test a system to focus on student-teacher analysis (S-T analysis). They say:

In S-T analysis, the classroom activities are divided into teacher’s and students’ behaviors. The analysis result contains ratio of teacher’s behavior (Rt), ratio of student-teacher interactions (Ch) and the predicted teaching mode. The visualization of the teaching mode is provided in the summary report. The system has been preliminary evaluated on 21 diverse classroom video recordings from different schools, covering multiple subjects ranging from Chinese, Math, English to Chemistry and Biology.

They produce a system architecture that captures the activity in the classroom, recognising voices and separating them out to attribute to teacher or student. They also capture video and detect facial expression and pose and hand recognition, then pull of the data together, analysing it for emotions and student-teacher interactions, before creating a summary report.

No alt text provided for this image
image taken from Jinglei et al (2023)

For action recognition they are able to identify 6 actions: nodding head, shaking head, tilting head, clapping hands, thumb up and pointing. The system locates 25 points of a human body using OpenPose models. The system also detects the teacher's face and classifies expressions into anger, disgust, happiness, sadness, surprise and neutral. Finally, it takes the audio streams, recognises the text content with speech recognition and separates voices. Key words are identified that may map to actions, including "good", "correct", "excellent", "right".

They explore the ratio of teacher v student speaking and also the ratio of interactions, producing a summary report that shows what behaviours were used and when, and what the primary mode was:

No alt text provided for this image
image taken from Jinglei et al (2023)

This is, of course, some very simple analysis at this stage. The authors have used this as a proof of concept to start building in more detailed analysis of what's happening, using multi-modal information to perhaps coach teachers in the future.

For their first attempt, they use humans to verify how well the model is doing on the various ratios of speaking/listening/interacting and this first attempt isn't great: only 48% accurate on the overall teaching mode, but up to 71% accurate on the ratio of teacher-student interaction. More work to be done.

I think this is all hugely exciting as it shows that we may in future start building AI models that use visual and audio elements to help understand what's happening in a classroom so that it can understand how to respond and suggest ideas to teachers. As ever, there are privacy questions about the video data, permissions, how accurate it will be and whether the advice will be good, of course.

Full paper citation:

Yu, J., Li, Z., Liu, Z., Tian, M., Lu, Y. (2023). A Student-Teacher Multimodal Interaction Analysis System for?Classroom Observation. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_29

Martin Hlosta

Researcher / Data Scientist focused on Educational Technology

1 年

Hi David, thanks for bringing this paper to my attention - after reading it, shouldnt the accuracy of the ratio be 59.7? (Rt) vs 71.1 that you mention (Ch)? This part was to me a bit confusing, so I am not 100% sure either.

要查看或添加评论,请登录

David Monis Weston的更多文章

社区洞察

其他会员也浏览了