ChatGPT is old news: How do we assess in the age of AI writing co-pilots?
In the seven months since ChatGPT was released, the world of generative AI has progressed at an almost impossible pace. Trying to keep up with weekly updates on its impact on education has been like drinking from a firehose. The next big thing on the generative AI front that we need to pay urgent attention to are?AI writing co-pilots that will be directly embedded into productivity suites like Microsoft Office. These “co-pilots” are AI-powered assistants designed to assist you to generate content.
The companies behind the productivity tools that most of us and our students use, Microsoft and Google, are getting their versions of AI co-pilots ready for imminent release.?Microsoft 365 Copilot?and?Google Duet AI?will embed generative AI directly into software like Microsoft Word and Google?Docs.?This means that we will all have access to text generation AI right from within the spaces where we write.?Notion, a popular app used by many students, already has an?AI-powered writing assistant.
Ethan Mollick , a pragmatic and thoughtful educator with AI from the Wharton School in the US, describes this as a “crisis of meaning“:
I don’t see a way that we avoid dealing with this coming storm. So many important processes assume that the amount and quality of our written output is a useful measure of thoughtfulness, effort, and time. If we don’t start thinking about this shift in meaning now… we are likely to face a series of crises of meaning, as centuries-old approaches to filtering and signalling lose all of their value.
So, what happens to assessments now?
Jason M. Lodge ?from the University of Queensland and colleagues, and? Michael Webb from Jisc UK’s National Centre for AI, have written about the main options regarding assessments. Webb writes that we can either?avoid it, try and outrun it, or adapt to AI. All of these responses are legitimate, and all have their short- and long-term benefits and drawbacks. Avoiding it includes reverting to oral or otherwise invigilated exams, which have?stronger assessment security?but higher costs. And, as assessment experts Elizabeth Johnson , Helen Partridge , and Phillip Dawson from Deakin University say, exams can have?questionable authenticity. That said, there is certainly a place for secured assessments in order to assure learning outcomes are met – more on this towards the end of this piece.
Outrunning it involves trying to design assessments that AI has more difficulty completing – but?the risks are that?our redesigns will only be temporarily effective?as the pace of AI development accelerates and will make the assessment more inequitable for many of our students. Modifying stimulus (e.g. using images in questions) has been suggested, but?GPT-4’s?ability to parse images?will be released?to the public. Modifying the content of assessment is also a popular suggestion, such as connecting with personal events, writing reflections, or linking to class material – but recent work has shown that?GPT writes higher-quality reflections than humans, and a?larger context window?allows students to send it as much relevant class material as needed. Modifying outputs that students produce is also a common response, such as changing assessments to in-class presentations or multimodal outputs – but rapid advances in?voice cloning AIs?and?video?and?image generation?AIs make outrunning AI almost futile. Besides, students could just (and have, to excellent effect) use AI to generate an in-class presentation.?This approach also has the potential to exacerbate achievement gaps for those students skilled in the use of AI and those who are not.
Adapting to it means we need to rethink how we assess – as Lodge writes,?this is a more effective, longer-term solution, but also much harder. The imminent arrival of AI writing co-pilots makes this even more important –?students will have AI asking to be invited into their writing process right from the start, through an innocuous prompt to us humans like “Help me write” (Google) or “Describe what you’d like to write” (Microsoft).?With this capability in place within the mainstream productivity tools also makes it inescapable – it will become specious and even meaningless for us to “ban” their use.
“Assess the process”, “use authentic assessment” – but how exactly?
Changing assessments to assess the process and not the product has been a growing retort over the last 6 months in response to generative AI. Lodge puts this elegantly in?a recent post:
While generative AI can increasingly reproduce or even surpass human performance in the production of certain artefacts, it cannot replicate the human learning journey, with all its accompanying challenges, discoveries, and moments of insight. It can simulate this journey but not replicate it.?The ability to trace this journey, through the assessment of learning processes, ensures the ongoing relevance and integrity of assessment in a way that a focus on outputs cannot.?(emphasis added)
Lodge mentions “reflection activities, e-portfolios, nested tasks, peer collaborations, and other approaches” as effective process-based assessments, but acknowledges that these “often do not scale easily or leave too much room for threats to academic integrity”. This is especially the case in the coming era of AI writing co-pilots.
And even before generative AI burst on the scene, there have been calls to recalibrate higher education towards more authentic assessment. Through a systematic review, Verónica Villarroel and colleagues determined there were?three main dimensions to authentic assessment:
Over the years, we’ve collected a few?examples of authentic assessment on Teaching@Sydney, and the University of Queensland has a?handy searchable assessment database with examples of authentic assessments.
But again, in a time when generative AI is so in-your-face and part of the way we [will] work, what is ‘authentic’, and what is ‘process’ – and how do we actually assess??Are our assessment approaches built to be?effective?or to be?efficient?
Rediscovering what it means to be human (and assessing this)
Going back to first principles, the Higher Education Standards Framework 2021 legislation reminds us that assessments need to assure that learning outcomes have been met, and that the way that we assess needs to be consistent with these learning outcomes. Helen Gniel , leading TEQSA in this space, recently reminded us of this. Specifically,?Part A section 1.4 clauses 3 and 4?say:
3. Methods of assessment are consistent with the learning outcomes being assessed, are capable of confirming that all specified learning outcomes are achieved and that grades awarded reflect the level of student attainment.
4. On completion of a course of study, students have demonstrated the learning outcomes specified for the course of study, whether assessed at unit level, course level, or in combination.
There have been calls for a revisit to learning outcomes in the age of AI – the Australian Academic Integrity Network has?posted on the TEQSA website?that “unit and course learning outcomes, assessment tasks and marking criteria may require review to incorporate the ethical use of generative AI”.?Given the close and dynamic interplay between learning outcomes and assessments, the increasing presence of AI in common productivity tools suggests we may need to rethink learning outcomes as well.
What do we want our students to know and be able to do when they leave our units and our courses? At Sydney, we want our graduates to have the?skills and knowledge to adapt and thrive in a changing world, improving the wellbeing of our communities and society. But what does this actually mean in practice, in the context of an AI-infused world? Cecilia K. Y. Chan and colleagues from the University of Hong Kong have?asked a similar question, from the perspective of educators. They suggest that there are key human aspects that AI can never replace, such as cultural sensitivity, resilience, relationships, curiosity, critical thinking, teamwork, innovation, ethics, civic engagement, leadership, etc. If these sound familiar, it’s because they’re also embedded in?Sydney’s graduate qualities?for students.
领英推荐
So perhaps a contemporary set of learning outcomes (and assessments) needs to address these human elements.?The ability to be able to judge the quality of their own work and that of their co-pilot may be the most important quality we need to develop. Sure, generative AI can create text and other outputs that mimic these human qualities, but that enhances our prerogative to adapt assessments appropriately to an AI world.
Designing assessments in an era where generative AI is inescapable
Consider this diagram, which we recently presented in the closing keynote of a UK conference on ‘Education fit for the future’ and had?useful conversations on via Twitter?afterwards. Much of the early dialogue around generative AI has been around its (mis)use in the bottom right quadrant – high AI contribution and low human contribution.?As generative AI becomes more inescapable, we need to consider where along the top half of the diagram each of our assessments sit?– and they will necessarily sit at different places, depending on the year level, learning outcomes, and other factors.
Leon Furze helpfully characterises this as an ‘AI assessment scale‘, where educators need to consider whether and how AI helps students in assessments. We’ve adapted his scale in the diagram below, to emphasise that?AI can also help educators?in the design and delivery of assessments.
What are some approaches to assessments that might work in the age of AI writing co-pilots, considering all we’ve discussed about process, authenticity, and learning outcomes? How might students work productively and responsibly in the top-right quadrant?
Co-creating an output with AI
There are many ways that industry and community groups?are already?and?could potentially?leverage generative AI, including content generation, customer insights,?software development, media generation, document summarisation, knowledge organisation, and more. What are the steps that we and our students can take through assessments in this context? Here is a strawman for consideration:
Evaluating co-creation with AI
An integral part of the suggested model above is to be able to evaluate the process of making something?with?AI. Some of the following ideas may be useful criteria to include in the marking rubric, alongside other criteria. Ensure that the criteria that you include in your rubric align with your learning outcomes and the Graduate Qualities.
These are by no means an exhaustive or prescriptive list. You will need to determine the most effective way to assess and evaluate these skills for your own teaching context.
In contexts where collaboration with AI is acceptable, authentic, and productive, it may be conceivable that a student who creates something?with?AI may learn more, better hone their critical thinking and information and digital literacy skills, and produce a better artefact. To very loosely paraphrase?Ethan Mollick,?students may be hurting themselves by not using AI, if AI-enabled writing is of a higher quality. Certainly, we need to be fiercely conscious of many issues of collaborating with AI, not the least because of?anchoring bias?where the first (in this context, AI-generated) piece of information clouds our judgement.
Assuring learning outcomes
Webb’s?strategies of either?avoiding, trying to outrun, or adapting to AI?are not necessarily stark choices and our final position will no doubt be a mixture of all three.
We want to know our students (and our future bridge builders and dentists) know what they are doing: our purpose and social licence is based on our ability to assure that our graduates have met their learning outcomes. This will no doubt involve some secure summative tasks in degrees and programs which exclude technology to ensure program level outcomes are met. Given the inevitable play off between authenticity, security and cost, we need to consider the responsibilities of each unit versus those of the program. With generative AI baked into our productivity tools, this may involve appropriate methods of assurance in these exceptional ‘high stakes’ assessments. As Cath Ellis from UNSW?suggests,?perhaps we need to stop agonising about over-securing every single assessment. Assessments where students collaborate with AI may help them learn critical skills and develop deeper disciplinary expertise, whilst assessments that are highly secured may help us (and them) additionally assure learning outcomes are met.
If we are honest, our present?assessment regime?reflects our need to deliver?efficiency, reflecting the workload implications of large enrolments and tight deadlines. Assessment for and of learning is not the only consideration that a unit coordinator tasked with returning results to a deadline and within a limited budget needs to consider. Whilst generative AI may offer some efficiencies in?designing and delivering assessments, it also requires us to consider where and what to assess.?
As the costs, security, and reliability of ‘traditional’ assessments soar, is this a time to think about assessing?where?learning happens – such as in the laboratory, in the design studio, and in the tutorial??At the moment, assessment weightings and hence student behaviours favour products (notably assignments and exams) whereas formative learning, development, and evaluative judgement occurs in the tasks in our active learning classes.
Adapting to AI will take time and will probably require a mixture of student skill building, staff development, and a paradigm shift in our understanding of what assessment is for. For students, this will involve foundational digital literacy in using AI (such as generic prompt writing) but this will eventually need to be a discipline-specific skill and part of their knowledge creation methodologies. For staff, we need to?rapidly and widely?build awareness and support the innovations of early adopters.
The ultimate aim of a Sydney education will perhaps not be changed by the generative AI revolution. Indeed, it should become more real and?relevant. Our assessments should require, encourage, rank, and reward the ability to use evaluative judgement on the quality,?reliability?and relevance of the resources our students use and the outputs they produce.
What now, and what next?
For further reading, consider these resources:
This article was re-published from https://educational-innovation.sydney.edu.au/teaching@sydney/chatgpt-is-old-news-how-do-we-assess-in-the-age-of-ai-writing-co-pilots/
Leader Physiotherapy program
9 个月Interesting to us I believe Djenana Jalovcic, Ivar Rosenberg, Maria Nordheim Alme, Michael Rowe, António Alves Lopes, Mike Landry, Niki Stolwijk, Paul Beenen, Marjo Maas
Principal Hardware Technical Expert
1 年PilotBard.com is available Plus lots of one word AI domains AIprojects.xyz AIfinder.xyz AIventure.xyz AIpredictions.xyz and more
Professor of Learning Technologies at Macquarie University
1 年Awesome work Danny - this pulls together the key issues at stake - finger right on the pulse - well done.
Teacher/Learning Designer
1 年I found this article really interesting particularly integrating written AI into the rubric to evaluate how ChatGPT can be assessed in learning.
Dean of Science, Western Sydney University
1 年Gang Zheng