AI & ML in Assessment: New Opportunities and Old News

AI & ML in Assessment: New Opportunities and Old News

Artificial intelligence (AI), machine learning (ML), and natural language processing (NLP) are buzzwords in many industries, and have taken the assessment world by storm since the release of ChatGPT on November 30, 2022.? A quick glimpse at the program for any assessment/psychometrics conference confirms this.? However, the use of large language model platforms like ChatGPT are just the tip of the iceberg when it comes to applying these technologies to assessment, as well as adjacent topics like eLearning, training, and employee selection.??

Here’s an overview of the other ways that AI/ML is impacting our field - and in some cases, has already been doing so for a century!

Content

Automated item generation: Writing items is the most time-intensive stage of the assessment development cycle, and AI can definitely help.? Early approaches used dynamic templates, but now LLMs provide far more power, especially with fine-tuning.

Item review: After items are written, we need to check the answers as well as format issues like if one response is longer than the rest.? AI can help with this.

Flagging enemy items: Historically, if you wanted to find if two items were on the same topic or one gave away another, the only thing to do was read through the entire bank.? Now, you can use NLP with text similarity metrics.

Item categorization: AI can also suggest categorizations of items into educational standards or other blueprints.

Psychometrics

Factor analysis: This approach to unsupervised machine learning was developed in 1904 and used in early research to investigate the latent structure of intelligence and personality.

Item response theory: IRT uses an unsupervised machine learning approach to make sense of assessment data in a way that puts items and examinees onto the same latent scale.? IRT results are then used to answer major questions, such as how to equate tests across years or link multiple school grade levels together.

Cognitive diagnostic models: CDMs also make sense of assessment data, but answer a different set of questions, focused on how items load onto specific skills or competencies and how we can use that to get a skill profile of an examinee.

Process data: This seeks to make sense of all the non-response data we have from an assessment (time spent, changing answers, what was dragged first on a drag and drop item…) and use it to obtain more accurate scores and feedback.

Delivery and scoring

Adaptive testing: Item-level adaptive testing and multistage testing like the new SAT personalize a test for each student with IRT, have been shown to have many benefits, from 50% shorter tests to increased security and engagement.

Automated essay scoring: You can train custom ML models on essays with human marks to accurately produce second or third marks. LLMs can enhance this.

Automated test assembly and LOFT: Test assembly can be time consuming, if balancing content blueprints, item difficulty, and other aspects like Bloom’s Taxonomy.? Custom AI algorithms can take days of manual work into seconds.

Remote proctoring: Tests that are large-scale and lower-stakes can be proctored with AI, which scans video streams for evidence of cheating, such as two faces in the screen or a rectangle that is 2x4 inches (a phone).

Chatbots for delivery: In some situations, tests can be given conversationally, especially if they are adaptive.

Using scores

Computational psychometrics: This blends adaptive learning with adaptive testing to create next-gen learning systems.

Predicting key outcomes: ML plays an important role in predicting human results, like university performance (admissions) and job performance (hiring).? This is often part of validating a test, if it is designed specifically for such a purpose.

Predicting anything else: You can use test scores to predict other things, like personality from LinkedIn profiles (CrystalKnows) or FaceBook likes based on IQ (Cambridge Analytica).

Recommender systems: Just like we use your ratings of TV shows to recommend the next ones for you to watch, we can use test scores to recommend everything from careers to the next course for you in a MOOC.

Gamification: As part of learning systems, MOOCs, and other platforms, we can introduce gamification by leveraging both assessment and learning results, to incentivize engagement.

What’s next?

We are at an interesting point in our field.? While some of the technology discussed here has been around for more than a century, other aspects are brand new.? We are just starting to think of ways to apply them.? For example, new research is exploring the use of LLMs to “pilot” new items by having them answered by an LLM trained to approximate the distribution of students.? This might also be used to unearth potential bias.

Clearly, the changes will be transformative, and the majority of them beneficial.? What are the biggest pain points in your assessment organization, and how might AI/ML/NLP help alleviate them?

Matthias von Davier

J. Donald Monan, S.J., University Professor in Education Executive Director, International Study Center @ Boston College

7 个月

In a nutshell: Scholars may use confusing and sometimes conflicting terminology but educational and psychological testing always used the quantitative methods of their time to advance what we do. Fewer buzzwords are needed, more curiosity is important, and learn your math to support your understanding of these fascinating tools.

要查看或添加评论,请登录

Nathan Thompson, PhD的更多文章

社区洞察

其他会员也浏览了