How the British Council uses AI testing in language assessments
British Council English Assessment
The right test at the right time
In recent years, AI testing has gone from a futuristic concept to everyday reality, used for everything from item and question generation to automated scoring and skills acquisition.?
Here at the British Council, we use it to make our AI exams more engaging and accessible for test-takers, to help boost their performance.?
For example, our Primary English Test features automated scoring and adaptive questions, which means question difficulty changes in real time according to individual learners’ levels. This helps engage them with level-appropriate challenges and avoids overwhelming them.?
However, ethical concerns remain, like cultural and algorithmic bias, data privacy and the potential for cheating.?
To dive deeper into these issues, we caught up with our resident AI expert, Senior Researcher and Data Scientist Mariano Felice. As well as researching and developing AI assessments at the British Council, he investigates automated language teaching and assessment at the University of Cambridge.?
Here’s what he told us.?
AI testing at the British Council ?
How does AI enable creation of more authentic, engaging and interactive test materials??
Things like speech recognition and text-to-speech enable spoken conversations with a machine, so assessments feel more natural. Using Large Language Models and chatbots, we can even generate artificial voices with different accents so students can practise communicating with people from different linguistic backgrounds. This is an excellent way to practise listening and communication skills.?
For item writers, generative AI can speed up content creation, whether it’s text, images or video.?
What are the benefits of AI scoring??
Firstly, it's automated and fast, so we can assess a learner's performance and provide results and feedback very quickly. We can also simultaneously assess many test takers in different locations, making high-quality language assessments more accessible. Finally, we can ensure results are fair, consistent and accurate since we’re effectively removing any potential human bias.
Which skills is AI best at scoring??
Writing, hands down, because we have a longer history working with text, and it’s much easier to process than audio. Tokenisers, taggers, parsers have all been around for decades, and they work incredibly well; sometimes even better than humans.?
Audio has more technological challenges. Speech recognition requires more complex models and a lot of computational power. Thankfully, the latest deep learning models have changed this landscape lately, and we’re seeing more and more advances. Hopefully, this means we’ll see more and better speech assessment systems in the coming years.
Tackling the challenges of AI testing ?
How do you prevent gaming of AI tests??
There are a few things we can do. First, we can implement multiple features to avoid having just one or two dominant ones that could be exploited, like essay length or infrequent vocabulary. By which I mean writing a lot or memorising and using some fancy words. Second, we can adjust the weights of features to change their impact on the final score. Finally, we can implement anomaly detection mechanisms to detect unusual patterns in the input, like repeated words or extreme feature values, and flag them for human review.?
领英推荐
How do you deal with the ethical issues of AI assessments??
We use the Ethics by Design approach, which involves anticipating problems before they arise. The key pillars are transparency; privacy and data governance; fairness;? individual, social and environmental wellbeing; respect for human agency; and accountability and oversight.?
Two areas where problems often occur are fairness and human oversight. Fairness means minimising any potential biases towards specific populations. For example, students who get lower scores based on their background or native language. So our training data must include representative samples from a wide range of L1s. We should also avoid model features that favour particular groups. Exhaustive testing is also essential to identify and remove any potentially unfair behaviour.?
Regarding human oversight – or lack of it – AI systems aren’t perfect, so they should always be monitored. There must always be a human in the loop who’s responsible for operating the system and can override its decisions if necessary. Inaccurate decisions could have devastating consequences for people's lives, so we must avoid unsupervised automated decisions in high-stakes scenarios.?
The key is to always bear in mind that we use technology to support our learning goals, not the other way around. Our approach is human-centred and learner-first, not tech-first.?
Learn more about our Ethics by Design approach in this video.
How we develop our AI exams
What research went into developing the Primary English Test and other AI tests?
The Primary English Test was developed by a group of experts in the design of tests for young learners, together with the team that translated those specifications into an AI solution.?
Some of the key AI capabilities used in the test include automated scoring and specialised automated speech recognition to understand children’s speech. To do this, we created models; artificial brains that learn how to solve tasks by looking at many different examples, using over 2000 samples from students globally; one of our biggest data-collection exercises to date. There's a lot of thinking and experimenting that goes into building these models before they're ready to use.?
How and when do you involve educators in the process?
We always try to involve all stakeholders as early as possible during development. We listen to what teachers, educators, and students want, whether it’s a product to practise speaking in a particular scenario or specific feedback on performance. It’s all essential to ensure our products align with educational needs.?
Also, all the tasks in our AI-based products undergo extensive piloting to ensure they’re fit for purpose and work as intended. Only after validation do we start building a product. Once that product is finished, we pilot it again and monitor its use, collecting feedback to make any necessary changes. Users’ voices always play a big part in shaping our solutions.?
The future looks bright for AI testing
Whether you’re an enthusiastic early adopter or a committed technophobe, AI is proving a valuable learning and assessment tool, and we’re committed to its thoughtful use.?
Looking to the future, more research is needed in a wider range of geographies and levels to make results more widely generalisable. We also need more information on the challenges of AI, its usefulness for developing receptive skills and how specific tools impact learning long term.?
As for our ongoing work, we’re focusing on improving test quality and personalisation based on data-driven insights and new capabilities. We’re also developing feature-based models to improve performance based on collected data for greater accuracy and explainability and tailoring our AI tests to new geographies. We’re also investigating Large Language Models to assess higher-order skills like problem-solving and interactional competence. Finally, we’re confident of seeing AI exams that can accurately assess all four skills in the future.?
As Mariano says, ‘In the last 10 years, we've seen more progress than in the previous 60… If we maintain this pace, I’m sure AI will be able to do many things in the future that it can’t now.’?
Watch this space…???
Are you using AI testing yet? Comment to let us know.?
Technical Director Utility Projects Europe, GoodWe
2 周Insightful