Evolution of Testing – Part 1: K-12
Kurtis Hewson

Evolution of Testing – Part 1: K-12

Education Testing in K-12 continues to confound students, parents, professional educators, policy pundits, and politicians. Policy over the years lurched from varying methods of normative comparisons to criterion-based ones, policies on strict accountability against standards to waivers and relaxed enforcement, as well as even opt-out allowance. Some views have been more ideology based, as for example “multiple choice” assessments are viewed by many to be fatally flawed, even as psychometric data proves their reliability. Further, most of today’s assessments continue to measure skills that miss the skills and knowledge for the digital age. Rote knowledge, now available in milliseconds through search engines, relegates knowing simple facts to lower importance than knowing how to use knowledge strategically or to apply facts and knowledge to solve problems.

 

Because of these perceptions, calls to end summative assessments are rising. Some question whether assessments have a role in the learning process, in measuring the quality of instruction and resource allocation priorities.

 

Assessments do play an important role and should be part of the permanent educational landscape. But to do so, educators, assessment developers, and policy makers must fundamentally re-examine what is assessed, when and why, and the how outcomes are used. The assessment industry seems to be in a time warp, with only minor changes over several decades, when in every other aspect, the world of learning is being disrupted. Active learning by doing, and in groups versus individually to reflect broader work environment is the norm. It is time to disrupt assessment -- the players, the development methods, the measurement goals and methods, the cost to schools, and the way results are used.

 

The future of assessment, in the age when anyone can learn anything from anyone in the world via technology, must be guided by three pillars:

 

I)     Authenticity to reflect the process and learning context of the Millennial generation and the advent of ubiquitous, instantly retrievable information

 

II)   Systems of Assessments from current approach of point, unlinked assessments

 

III)  Individualized, Predictive and Prescriptive, for learning, instruction, and career growth based on data signal from multiple sources on learner’s progress and knowledge

 

The one thing student assessments, designed to be student measures of learning, should not be a cudgel for teacher evaluation. There is research[1] that leaves serious doubt about the utility of this practice and risks undermining the learning environment.

 

ARCHITECTURE FOR THE FUTURE OF ASSESSMENTS

 

I.   Authenticity is paramount

 

What is assessment authenticity? Authenticity is the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful, and that the links to essential skills (relevant for tomorrow) is clear and powerful”[2]. As mentioned at the outset of this discussion, technology is changing what students need to learn and how they learn, what is therefore taught and how, and subsequently what is assessed. Further, we are learning that success in college and careers is more than a sum of knowledge as measured in one point-in-time assessment, or even a series of annual assessments; is the application of and the use of cognitive knowledge consistently demonstrated in realistic situations. What matters is how knowledge is used in work and life in the real world, and assessments must reflect this.

 

At the risk of over-emphasizing this point, let’s use a good example of the shift. In the year 1 BG (Before Google), students memorized state capitals and some relevant information, such as population and geographical location. Today, knowing top-of-mind that Trenton is the New Jersey state capital is rather unimportant. What might be a lot more important is to understand why Trenton was chosen as a capital, how its economy and its interrelation with government and manufacturing rose and affected the entire state, how the tenor of NJ politics changed as Trenton lost its manufacturing base, and the implications for the future of NJ’s or any other state or country’s political economy. No assessment should ever ask “what is the NJ state capital” in selected response format. It should ask about the rise and fall of a state capital, and how to use this information to predict how economies and politics might change to reflect events, and what that information can be used to improve life. Google is actively trying to shift student learning from fact knowledge to problem-solving and collaborative skills. This raises the question “whether the purpose of public schools is to turn out knowledgeable citizens or skilled workers”.[3]

 

The way students today learn, discover and use knowledge is obviously changing; some might say we've reached an education inflection point. As illustrated above, the use of the information to drive hard thought and practical application is paramount. Students are opting out of assessments that provide little or no actionable guidance for learning. Selected response assessments bear the brunt of this negativity. In this changing context, authenticity in assessments can be implemented when the following elements are employed cohesively:

 

a)   Depth of knowledge (DOK) measure. What is Depth of Knowledge? DOK is a range of information and knowledge used in increasingly value-adding applications which demonstrate truly a learner’s understanding. The illustration below best explains DOK[4]

 

Figure 1. Webb DOK Model

 

 

 

Millennials are experts at researching information (however, they may have difficulty distinguishing between evidence and opinion, and “fake” news from the truth). Testing rote knowledge annually pales in comparison with authentic assessment that addresses creative and conceptual thinking and ability to evaluate and synthesize information. Robots will take the jobs that require DOK Level 1 (Recall and Reproduction), and likely a good part of Level 2 (Basic Application of Skills and Concepts). Authenticity will be derived from testing strategic applications of knowledge and the ability to extend knowledge contextually. The key for assessments will be how to measure where students are in this spectrum, and train learners to drive to Level 3 (Strategic Thinking) or even Level 4 (Extended Thinking). Perhaps the new measure of “proficiency” will not be the current 1-4 as measured in state assessments, but Levels 1-4 from DOK constructs.

 

b)   Constructed response (aka open-ended responses) and performance tasks. While selected response will always be part of an assessment, other item types provide better ability to test real-life application of knowledge and thus backstop DOK. Examinees can see the more realistic application to real life in constructed response and performance task items. Open ended response is more widely accepted by the public and gives additional signal data.

 

c)   Games and Gamification. To engage Millennials, games (i.e. contests) and gamification (the use of video game and other elements) can be deployed to capture attention and test skills. To Millennials, contests and familiar gamification tests of skills are more relevant, and increasingly applied to real life situations. Is not a real-life simulation of treating an emergency patient more realistic than any selected response test?

 

d)   Project and Team-based learning. Here again, Millennials understand the need for social cohesion and the real-life requirement to work together, and how projects better reflect real life tasks. Work is increasingly becoming collaborative in nature as digital communication drive global sharing processes. Project-based learning is a key to increasing authenticity.

 

II. Systems of assessments replace point assessments and are buttressed by new signal sources

 

Systems of assessment aim to provide a more holistic picture of the student. In a nutshell, a holistic picture is formed by connecting assessment outcomes along two dimensions: longitudinally over time and cross skill and knowledge. Longitudinal measures the progress made and retained. Cross knowledge and skill connection is underpinned on the premise that learning is interrelated, i.e. that evaluate and inform, and vary in type of item used and in frequency of assessment, with each assessment seeking to measure different aspects of the student. The point is that a series of varied assessments taken over time should better measure students’ progress holistically than any single assessment, in at least two regards:

 

  1. Tests taken over time, measuring skills through different measures, provides more insight into a learner’s capabilities

 

  1. Skills are interrelated and measuring one gives insight into another. For example, while reading and writing skills are obviously liked, data from systems of assessment will likely reveal that writing logic and mathematics are closely tied as well. Taking multiple snapshots of skills through many different assessment approaches will reveal this hidden information if the right analytical resources are brought to bear

 

Systems of assessments, for example, could include selected response from summative assessments, periodic assessments that are either five to seven time annually benchmarks or simple, highly frequent formative tests, and could be varied as project based, constructed response, teacher assessments, and the like.

 

David Conley and Linda Darling-Hammond[5] have written extensively and conclusively about the superiority of systems of assessments over simple summative assessments whose primary purpose is to rank.

 

It is hard to argue with the proposition that a well-architected system of assessments will better reflect what a student’s capabilities. To date, scholars have focused on traditional cognitive measures and constructs, which are a great starting point. However, a better way of building these systems involves supplementing the traditional cognitive assessment measures with new elements, to create yet more robust systems of assessment:

 

a)   Cognitive assessments must shift from single point summative approaches to systems of assessment solutions. Assessments will incorporate summative results from standardized selected response questions and use more formative testing, standardized performance tasks and project tasks, and evaluations from other stakeholders such as parents, teachers and other students

 

b)   New signals on progress. In addition to the well know types of assessments discussed above, new modes of assessments that use multiple, continuous measurements from digital sources (e.g. click speed), facial recognition, auditory, oral tones, and writing sample analysis will become more prominent. There is also a view that the Internet of things (IoT) will bring additional signal sources.

 

c)   Non-cognitive/Social and Emotional Learning (SEL): Assessments to test mindful insights into social and emotional learning factors and be integrated with cognitive assessment will add a richer sense of students’ ability. Cognitive and non-cognitive data regression will be highly revealing in the context of social, emotional and traditional cognitive measures. Non-cognitive assessments that focus on situational judgment and forced choice answers, protocols which are less easily faked and rich with information), essay writing samples, facial tones, oral presentations and the like could propel the use of SEL factors to the forefront of educational measurement science and use.

 

III.    Individual, Prescriptive and Predictive over time on assessment data supplemented by additional data signals on students learning process that provides critical insight

 

Individualized instruction gained great traction over the years, with it many different names and approaches. All share the idea that students learn differently, whether it is the pace or the style of learning. Most efforts have had marginal success. The question is how to fully understand each student’s specific differences and teach them at scale. Differentiated instruction that tracks current and longitudinal progress (longitudinal data is especially important to measure growth) may be finally at hand, the gift from big data. In addition, students who can see their own data such portrayed will absorb, own and potentially act on it.

 

Data is going to be the game-changer. We live in a big data, algorithm world, and the use of data to guide learning, college and career choices will be enhanced by thoughtful and careful application of student data. Just as e-commerce sites use data to make personal shopping suggestions, data will be used to assist students to better understand their gaps, strengths, and implications of their capabilities.

 

a)   Student “datasets” will be developed that combine multiple signal sources – cognitive, non-cognitive, demographics and learning context – to produce big data-driven insight

 

b)   Data will create student portfolios which tie the data to learning, career paths and ideas for being a better citizen

 

c)   Students will take ownership of the data and the implications because they are personal, targeted, uniquely informative and built authentically

 

The goal is not to reduce students to a dataset, but recognize how the data could improve outcomes around academic achievement, college and career readiness, and overall civility and resulting life contentment. For those who doubt the non-cognitive and cognitive links, consider this: does a student with perseverance issues do less well on constructed response items than one with a better-trained perseverance capability?

 

Student datasets will be a fact in the student landscape in the future, this is without doubt. The NAESP, among others, provide broad frameworks for using data[6]. Indeed, one survey reported that 67% of teachers are not satisfied with data/tools[7]. They claim, “Data is often organized student by student—making it difficult to quickly scan the whole class—or lacks the detail necessary to help teachers address the needs of individual students”, and “slow and overwhelming quantities of data make administrative tasks laborious and keep teachers from building close relationships with their students”. What is missing data management protocols that are already available in commercial applications that allow commercial enterprises to easily aggregate data or parse the data that counts. The science of maximizing available student data and driving to instructional changes is still in its infancy although commercial analogs exist.

 

Emerging data tools may simplify the problem. Consider two functions where datasets enable what will be in student futures: differentiated instruction and certification.

 

THE BLOCKCHAIN SOLUTION FOR LEARNING AND CREDENTIALING

 

For differentiated instruction, longitudinal datasets will provide educational data analysts a lot of fuel to make extraordinary conclusions about student preparedness, learning styles and other insights. Application of properly differentiated instruction will perhaps be facilitated by Blockchain technology soon as it provided rich and verifiable student data. Blockchain technology is a type of distributed ledger that stores a permanent and tamper-proof record of transaction data. Distributed ledgers can be thought of as a type of database, but unlike traditional databases, distributed ledgers are managed through a peer-to-peer (P2P), architecture and do not have a centralized data store.[8] 

 

Another dataset application will apply to credentialing and in general tracking of formal or informal credentials, certifications or comments. When a student accomplishes a task, gets a certification, or notes in their record about their personal accomplishments or behavior, the data is ephemeral. What if though, a permanent ledger could be devised to track these records?

 

The premise is borne out by the work of The Mastery Transcript Consortium. The goal is to disrupt the traditional college entrance admission transcript from simple ABCD to other real world measures. “Under the Mastery Transcript students gain micro-credits (not grades) for a series of skills such as analytic and creative thinking, leadership and teamwork, global perspective, etc. It will allow college admission officers to see the complete picture of a student’s strengths — and without using any grades or numbers.”[9] It will also, of course, provide proof that certain awards or courses were achieved or taken. If this is not prima facie evidence of the drive to use real world measures, what is?

 

The extended concept, beyond the Mastery Transcript, is that an immutable record of educational achievements can follow the student. The Blockchain data would consist of grades, assessment scores, teacher evaluations, social and emotional factors (e.g. leadership, collaboration, integrity…), growth, stacked credentials, and potentially many more variables. The student Blockchain would move from institution to institution as records are added. The data would be used for college entrance, third party analysis for achievement prescriptions or employment placement. The analysis of the students’ data will reveal new insights heretofore impossible to discover because the data is united, standardized, and available to parties that can elicit quantitative and qualitative insights.

 

There is a day coming when assessments are “pull” not “push” (sometimes referred to as personal testing <not personalized testing>), where students can engage in assessments that foster creativity and elicit meaningful pursuits and questions to reinforce their learning where they see the gaps. If the student dataset is properly presented to the child, and an underlying zone-of-proximity algorithm helps a student select the next learning and testing objectives, educators will achieve a new level of learning.

 

I’d like to share a personal story around differentiated instruction to illustrate its power, and how datasets undergirded the experiment. Years ago, McGraw-Hill developed The Power of U, the first automated data-driven formative assessment program, and rolled it out a proof of concept to 25 sixth graders. These students were given a learning progression and each time they mastered a learning objective, they colored in the node. A game ensued among the students to see who was moving ahead the fastest. The good-natured competition which drove students to wish to learn more. This competitive spirit was likely part of the documented learning and retention increases. Underlying this was an algorithm that understood the student’s learning style and progress and assigned learning modes and instruction based on the data. It worked beyond expectations.

 

Another example of motivated self-assessment is portrayed by the Writing Practice Program, or WPP (full disclosure: my client, ERB, markets the WPP) which in real time grades prompted student essays. It is typical to see students from all academic levels fix and resubmit their essay results 15-30 times until they master excellent writing on that essay. This is a solid example of Bloom’s The Two Sigma Problem at work.

 

The point of these illustrations: data infused, objective feedback[10] from assessments clearly drove improved learning. This exists today and will only grow as analysis technique and data management tools like Blockchain proliferate in K-12.

 

BRING IT TOGETHER AND FACING THE CHALLENGES

 

The three pillars are self-reinforcing, as authentic assessments are improved and evolve to be additionally authentic when they are based on multiple assessment types and proven by data over time to be more usefully guide to lifelong learning because it puts the learner in charge of their own learning over time.

 

Figure 2. Interrelationships of the Pillars

 

 

 

 

There are key challenges to realizing this vision, but these will be overcome as technology reduces the cost, or societal norms change and policy makers step up. The hurdles can be addressed:

 

1.   Expensive constructed response and performance tasks. Automatic item generation rapidly gaining credibility can be an answer. The work by Mark Gierl,[11] for example, at the University of Alberta is extraordinary. His supercomputers are generating high-quality items at negligible cost right now. In addition, algorithmic essay scoring is rapidly matching human based and is already widely deployed in state testing

 

2.   Privacy breach risk: inBloom was ruined just by the risk of privacy concerns[12]. However, improving technology solutions may be the key to solving this issue[13]. Policy evolution, such as in FERPA, will almost surely be able to drive a personalized learning carve out, and policy makers will create a set of workable ethical guidelines. The level of data granularity would also need to be explicitly worked out but there is no reason this cannot be divined. In addition, it seems that privacy rights are being gradually forsaken both voluntarily and involuntarily, and I can foresee in the relative short-term when the benefits are seen to outweigh the risks. Millennials are not as concerned about privacy as are prior generations, and the Rubicon has been crossed. Policy makers can also develop safeguards that parents can buy into to ensure that student data is not abused. There is some risk that the permanency of the Blockchain could drive harsh, lifetime repercussions on students who may commit even minor transgressions in their youth. Solving this puzzle will be the ultimate cost-benefit analysis!

 

3.   High costs of gamification and simulations. This is still true now. However, many start-ups have formed and continue to form that are tackling this problem using artificial intelligence and other code creating tools to automate this development. In addition, the continued interest and actions from commercial video game and virtual reality enterprises continue to reduce costs. Data collection from these technologies will be integrated into software soon enough

 

4.   “Personal Testing” is a wild conceit; students can’t test themselves. In truth, if assessments are fully engaging, informative, and not the “enemy”, students will be more inclined to test themselves. It will be incumbent on test developers (or others) to build out engaging assessments (through games or gamification), coupled with education publisher developed instructional resources such as learning progressions (and other visual aids) and finally artificial intelligence to prescriptive strategies based on the data.

 

It is clear most students want to learn and want to be tested. It is the duty of the assessment community to build assessments that are insightful, highly useful to instruction and welcomed by students.

 

Finally, the issue of the long-term viability of summative testing needs to be addressed. There is a view that summative testing is unnecessary, and can be replaced by the continual measurement tools such as formative testing, signal inputs previously discussed, and even as at least one study pointed out that college success can be better predicted by GPA at attendance records than entrance examinations.

 

If continuous or continual tools can be proven to capture a student’s knowledge at a point in time, and concomitant reasoning ability, then all summative testing does is measure a student’s ability to demonstrate capability at one point in time. With information access ubiquitous, is not more important to real time advancement in reasoning skills and knowledge application periodically rather than all at once? What better predicts with authenticity how a person will succeed in college and in their careers and life?

 

Summative testing will likely survive as a part of a system of assessments but will lose much of its importance, and be relegated to simply “year end” testing and no longer carry high stakes. That is true disruption that is long overdue.

 

Finally, I’d like to thank Dr. David Clune, ERB President and CEO, and Dr. Deven Sharma, Fellow at Connection Science @ MIT Media, for sharing their perspicacity in reviewing, challenging and improving the thoughts in this document.

 

Please feel free to provide comments, suggestions and any additional thoughts to this topic.

 



[1] Schaffhauser, Dian, U Chicago Project Aims to Dispel 'Misguided Notions and Outdated Assumptions' about College Readiness, Campus Technology, 2/14/17

 

[2] Wikipedia

[3] New York Times, How Google Took Over the Classroom, 5/13/17

 

[4] Webb, Norman L. and others. “Web Alignment Tool” 24 July 2005. Wisconsin Center for Educational Research. University of Wisconsin-Madison. 2 Feb. 2006; Maverick Education LLC 2015

 

[5] Conley, D. and Darling-Hammond, L. “Creating Systems of Assessment for Deeper Learning”

 

[6] Using Student Achievement Data to Support Instructional Decision Making, NAESP

[7] Making Data Work, Teachers Know Best

[8] TechTarget

[9] World Leadership School

[10] Assessment feedback approaches need to be refined for students to embrace the results. Feedback is beyond the scope of this paper, but the art form needs to be raised so that results are further viewed as authentic, and activated for good purposes.

 

[11] Gierl, Mark, Advances in Automatic Item Generation with Demonstration, TAO Days —Swiss Conference of Cantonal Ministers of Education Bern, Switzerland—October 1-2, 2013. 

[12] See The Huffington Post 4/24/14 for more on this topic

[13] Litton, James, 6 things schools can do to ensure student data privacy, eSchool News 10/16

 



Sarath Chandar

Data Specialist / Business Analyst / Tableau

3 年

Dan, thanks for sharing!

回复
Venkatesh V (VV)

Outstager Consultancy Services - Build Operate Transfer of Offshore Development Centers I Professional Services l IT Services

7 年

Wow.. Blockchain in assessment ! good read.

要查看或添加评论,请登录

Dan Aks的更多文章

社区洞察

其他会员也浏览了