Institutional Student Assessment and Evaluation for High School Courses
The Big Question
Should students get 50% for doing nothing?
As strange as that question may seem it is one that many US jurisdictions have been struggling with for some time and I suspect that it is one that other educational jurisdictions may also grapple with.
I would argue, NO, and there is really no need.
Context
Michael is taking a course over four quarters. For the first quarter he is totally disengaged, does not submit any work earning zero percentage points [0%]. However, he becomes engaged after that and earns 100% in all assignments and assessments in the course. His earned term grades are F, A, A, and A. Out of a possible 400 points Michael earns 300 points and ends the course earning 75% of the possible points which equates to a grade of C.
Is a C [75%] a fair and accurate evaluation of Michael’s achievement, his performance, or both? Is this the grade Michael should get for this class?
This is the question that many education systems and schools have struggled with and are currently continuing to struggle with. How can and should the achievement and performance of students be evaluated, communicated, and reported at the institutional level? Which one is more important to report on: achievement [what the student has learned and is able to do] or performance [what the student actually did] and is there any difference between these two metrics or does the difference between them even matter?
Does your school system have a philosophical stance on this, one way or the other?
Evaluation in Different Settings
With the advent of competency based class grading the difference between a student’s performance over the course of a class and their achievement as measure by mastery of the class content can yield very different results. The difference between achievement and performance is nowhere more evident than in a performing arts settings. Conversely the alignment between achievement and performance is nowhere more synchronous than in a visual, creative, or literary arts setting. Hopefully the examples below will clarify the differences between achievement and performance.
Performing Arts Setting
Consider the following: An actor has studied all the lines and movements and has mastered the part; a violinist, flautist, or singer has studied the score and performed the solo tens (if not hundreds) of times, flawlessly from memory; a dancer has mastered the choreography and performed the role multiple times. Now consider that for any of these performance artistes during the live performance there is a cough, a flash of light, the siren of a passing ambulance and suddenly the sequence is lost. There is a moment of confusion, then a brilliant recovery continuing onto a flawless ending.
Yes the performance was flawed in that it was not note, word, or step perfect, but the performer has mastered the craft. The brilliant recovery and flawless ending occurred because the performer had mastered the piece. Achievement was perfection even though the performance was flawed in the eyes of some. There are those in the audience who expect a note perfect performance, and there are those who celebrate brilliant artistry as reflected in the recovery and overall performance. Therein lies the difference between achievement and performance. How should the performer be judged?
Visual, Creative, and Literary Arts Setting
The writer has worked on the novel, the composer has worked on the symphony, or the visual artist has worked on the sculpture or painting for weeks or even months. Re-writing here, chipping there, repainting here, re-working over there and eventually the product matches the vision. It is ready to be revealed to the public. Here the achievement matches the performance provided the artiste is given enough time and resources to realize their conception.
Here achievement and performance are synonymous and it does not matter how the performer is judged, the resulting judgement would be the same.
Achievement and Performance
Achievement and performance are not always synonymous. When they are not it is important to identify which descriptor is more authentic and appropriate.
Achievement
Assessments measure performance but the ultimate goal of the educational assessor is really to measure achievement. I do not dispute that well designed assessments, delivered in an ideal setting, and evaluated in reliable and valid ways can come very close to giving great insight on achievement. However, achievement is potential. It is not easy to measure potential as many things may impact performance causing underperformance or even over-performance. For that reason it is important to maintain the distinction between Performance and Achievement as well as all things associated with these two labels. I will, for instance, refer to the Performance Gap rather than the Achievement Gap, even though the term Achievement Gap is prevalent in the literature. I make this choice for several reasons, but most of all because the evidence for that gap comes from student performances not their actual achievement. Persistence of the term Achievement Gap also seems to accept at its premise that some people may be predisposed to higher or lower levels of achievement based on non-cognitive characteristics even though it has been shown that performance can be subjected to many factors which may obscure and obfuscate achievement.
I do not think it coincidental that there is a certain persistence associated with the term Achievement Gap where black and brown people are perceived to less capable that others, but that is simply my own opinion.
Performance
There are still many people who believe in the primacy of a final examination and there are many situations where people are not necessarily interested in potential [achievement] but are more interested in performance. To be honest, if I am under the knife I am not so much interested in the ability or potential of my surgeon, I am far more committed to how well he executes my surgery. So, in the right settings, I too am far more interested in performance than achievement. However, as an educator engaged in student evaluation with an understanding of the roles of random and systematic error in any performance, I do believe that students should not be overly penalized for errors.
Evaluation
Evaluation by Different Methods
In our scenario Michael has four term grades: F, A, A, A.
What would an appropriate final mark for Michael in this course be and why would it be appropriate?
1. Using percentages as in the example above Michael’s Final Mark would be C. I would argue that this is an accurate measure of his overall performance in the class, though not of his achievement in the content covered in that class. I would argue that he has mastered the content despite his poor beginning and so the C is not truly reflective of his achievement. You should feel free to differ.
2. Some school systems use percentages with a baseline, most often a minimum of 50%. In such a scenario Michael’s reported marks for term 1, as distinct from his earned marks, would be 50% [F], and 100% [A] for each of the remaining three quarters. His reported Final Mark would be 350 out of a possible total of 400 or 87.5%, a high B, possibly a B+ in some systems. I would argue that this is a measure of performance that gives achievement a somewhat more significant weight. But what about the ethics of giving 50% to a performance that does not demonstrate any mastery? Would a teacher be comfortable with this approach?
3. If Michael was enrolled in a competency based program which focused on the highest level of mastery his final mark might be 100% or A. The nature of this metric is focused on his achievement in the class not his performance during the time he was in class. Would teacher, student, parent, potential employer all be happy with this metric?
4. If he had been enrolled in a course examined by a Final Examination it is likely his final grade would also be 100% or A though it is possible that his exam performance could be much lower. This metric is focused on a one time examination performance which is accepted as a reflection of his achievement in the class. This has been the historical reality for many in the education world. Are we satisfied with this approach?
5. If the course had used a 4 point quality point approach the quality points earned would be 0 for the F, and 4.0 for each of the As. Michael’s Final Mark would have been a B [12/4 = 3, a B on a 4 point scale]. This score reflects his performance over the duration of the course but is closer to score 2 which gives achievement some weight over performance. There is no award of any points without demonstration of mastery here. Would teachers, students, parents, educators, future employers, and college admissions counselors be satisfied with this metric?
Based on the five methods presented here Michael could earn grades of C [percentage averaging], B+ [percent averaging with a baseline], B [quality point averaging], A [competency based evaluation], and A through F [Final Examination evaluation].
These five methods represent some of the ways that a final mark may be calculated and should not be regarded as anywhere near an exhaustive list of possibilities. In addition to the term marks some systems might require a comprehensive final examination or project and even a midterm examination. These examinations and/or projects may carry different weights and provide additional data points for student evaluation.
For the example above a mid-term or final could allow Michael an opportunity to demonstrate mastery on the standards covered in term 1 thus ensuring that a quarter of the content is not excluded from the material assessed for mastery.
Which method is best? Which method would parents, students, teachers, administrators, potential employers, college admissions counselors, and others with a legitimate educational interest find to be both fair and accurate?
Choosing an Assessment and Reporting Protocol
While tradition plays a significant role, the choice of assessment and reporting protocols is not a trivial question. Often educators, parents, students, and employers may discover that, though assessment and reporting may have been carried out in a particular way for a long time, on further introspection and reflection the intent of the assessment and the message communicated are not necessarily the same.
Ideally assessment, evaluation, and reporting should use a protocol that is perceived to be fair and transparent by the educators, students, parents, potential employers, and anyone else with a legitimate educational interest.
It is very easy to confuse fair and transparent with familiar. For institutions that have been evenly averaging percentages the introduction of weights may seem an anathema. Why should one assignment get more weight than another? Of course there are many good answers to this question but if your experience has always been that all assignments are equally weighted then this approach may seem a little like gerrymandering to get a desired result.
The Challenge of Change
Data Points
Historically most student assessment and evaluation was done by final examination. Students would study for some period of time at the end of which they would take a comprehensive exam. More and more, as educators have become aware of the differences between achievement [what students know and can do] and performance [what students demonstrate in a particular setting at a given time] this dependence on a one time demonstration of content mastery has been replaced by strategies more focused on continuous assessment. What was formerly the weight of a final assessment is now spread over multiple data points to minimize localized temporal effects. The grain-size of assessment, whether four, five, six, or even more data points and their weights, whether they are all given equal weighting or some are given greater weights than others, are determined by the philosophy and rationale underlying assessment, evaluation, and reporting in the institution. There is no right and wrong, only choices more or less aligned with the institutional philosophy.
Student Evaluation and Reporting
In most western societies students between ages 14 and 18 engage in college and career readiness preparation activities. Currently in the United States, these students would be in High Schools completing courses leading toward a high school diploma. They would also take some combination of local, national, and possibly even international assessments.
Local assessments include teacher made assessments, school level assessments, district level assessments, and state level assessments. These can take many different forms including pencil and paper, portfolio, performance or other authentic assessment.
National assessments, based on the final examination model, include Advanced Placement Examinations, the SAT Suite of Assessments which include the traditional verbal and quantitative as well as the subject tests, the ACT test, Smarter Balanced assessments, PARRC assessments.
Some of the more common International Assessments offered in the US are the Cambridge A-level Examinations, and the International Baccalaureate Diploma Assessments.
Students coming out of American High Schools will potentially graduate with a number of courses completed in core and elective areas. In addition they may have completed some AP and/or IB courses. On top of this they will have a record of assessment scores which may include PARCC scores, Smarter Balanced scores, PSAT scores, SAT scores, and ACT scores. Some students may have completed specialty programs in the Arts, STEM, Hospitality, Construction, and Cosmetology or any number of career readiness programs that may be offered in their jurisdiction and some students may have certification level qualifications in professional areas.
How can we fairly and accurately assess students and student work in a meaningful way and not overly penalize students for making mistakes?
Allow me to further set the context.
What is the Role of Public Education?
With a requirement of mandatory school attendance for students 16 and under it would be dishonest for me not to admit that there are some children required to attend school who, while physically present, are not necessarily there with the undivided focus of acquiring an education. In truth, in some schools, there are quite a number of students present because attendance is a legal requirement. In some of those schools some students are always trying to find ways to get around that legal attendance requirement.
Yet sometimes, a student who starts the year simply fulfilling a legal attendance requirement, may become engaged and go on to demonstrate mastery: Michael with his grades of F, A, A, A.
We have got the student in class and engaged. How much should the student be penalized for the initial period when they were not engaged?
What and how should we be assessing?
The process of education begins with getting students to attend school, getting them to stay in class, and actually getting them to learn something. This is part of the justification for those who argue that if students are attending class and participating in classroom activities there is no way that their minimum score should be zero.
If not zero, what should the minimum score be?
The Power of Zero
Suppose, as in the scenario presented, a different student, Michael’s sister Michaela, had earned 100% in each of three quarters but had earned a different percentage grades in that fourth quarter. How would that impact her final mark for that course?
For the sake of convenience we can look at six cases: (1) B or 80%, (2) C or 70%, (3) D or 60%, (4) F or 59%, (5) F for 50% and (6) F or 0%.
(1) In the first scenario Michaela would earn 380/400 points = 95% or an A
(2) In the second scenario it would be 370/400 point = 92.5% or an A
(3) In the third scenario, 360/400 points = 90% or an A
(4) In the fourth scenario, 359/400 points = 89.75% or a B [more often 89.75% would be rounded to 90% or an A]
(5) In the fourth scenario, 350/400 points = 87.5% or a B
(6) However, in the fifth scenario Michaela would earn 300/400 points = 75% or a C.
Notice that with three grades of 100% it is only the F that could cause a student to drop below an A. Because the range for an F is from 0 up to 59 a student with three As and one F could remain with an A if that student earned 359/400 =89.75% which many a teacher would round to 90%, or could drop by as much as a full two letter grades to a C. This could create challenges from a reporting perspective of prima facie reliability and consistency.
The Challenge of Fair and Effective Reporting
Many parents and students are primarily interested in the letter grades that are often reported and pay scant attention to the associated numerical percentage grades. If you do not believe me on this talk to any student or parent of a student who earned 89% rather than 90%. The reaction is very different from that of a student who earned 90% instead of 91%. Of course this is because 89% is a grade B while 90% is a grade A. I often say that the difference between 89 and 90 is much greater than the difference between 90 and 91.
Being on the cusp of a higher grade band is very different from falling within that grade band, this is why I say there is a lot more difference between 89 and 90 than there is between 90 and 91. If I were to hazard a guess I would attribute this to the narrative descriptors and quality points that we associate with the grade bands. The letter grades A through D and F communicate a lot more about student achievement and performance than numerical values, even percentages. In my experience it is a lot more common to hear teachers talking about students as an A or C student rather than a students who scores in the 90s or one who scores in the 70s.
Communicating with Parents
Having letter grades which stand in for numerical percentages can present problems for communication between school systems and parents. In that setting all As are not equal. There can be high As and low As and these behave differently when computing a grade average. The result is much more nuanced and reflects the actual impact of the numerical values that the grades represent. This is somewhat alleviated by the use of pluses and minuses but even then fine detail can be obfuscated.
Successful education is, by nature, a partnership with school and home. That partnership requires clear, transparent, and effective communication. Parents want a grading and reporting system that is prima facie reliable. Reliability speaks to the fact that if they see a particular sequence of grades, say F-A-A-A, then, on the face of it they anticipate the same final result each time.
Consider the parent of a student who entered school and was not engaged but later became fully engaged. Imagine that student earned the same sequence of grades in three different classes. In one class the F was 59%, in the second 50%, and in the third 0%. That student could end the year with final marks of A, B, and C even though the sequence of grades reported on the report card would be the same in each case.
I do not dispute that mathematically minded parents would value the nuanced reporting of a percentage based system where F-A-A-A could be anything from A through C. Most of the parents and many of the educators that I deal with do not reflect that level of mathematical sophistication. Far from them finding the nuanced outcomes of a percentage based system empowering they would be more likely to find those different outcomes confusing.
Even parents and educators who are mathematically sophisticated would find justification of the fairness of a system where F-A-A-A has multiple possible outcomes somewhat of a challenge. In my mind I am replaying conversations with parents around these same issues where after I have explained how the numerical values account for the outcome in great detail the parent rejoins with the question: “But my child got the same grades in X and ended up with an A, how could (s)he get a C in this class? This makes no sense.”
A system where F-A-A-A always has a single outcome is better able, in most cases, to bridge any communication gaps between school and home.
I am not claiming that this is true for all parents or educators but I am saying that for most of the people that I work with a system where there is only one outcome for each unique grade combination would be better able to support clear, effective, and transparent communication between school and home.
Understanding Narrative Descriptors of Grades
What do the grades A-F mean?
I believe that in high school our three anchor grades are A, C, and F and the other grades of B and D fill the gaps between A and C and C and F respectively.
To evaluate the narrative descriptors I would start with C because I believe it is generally accepted that a grade of C indicates a student meets expectations. The grade F indicates that the student significantly fails to meet expectations, and the grade of A means the student significantly exceeds expectations to the point of being outstanding. B falls between A and C indicating that the student exceeds expectations, and D falls between C and F indicating that the student is approaching expectations.
A – Significantly exceeds expectations, outstanding
B – Exceeds expectations
C – Meets expectations
D – Approaches expectations
F – Fails to meet even minimal expectations
I would suggest that these narrative descriptors are pretty universal. Some school systems accept grades A through D as passing grades. Some school systems only accept A through C as passing grades since the argument is that while students with a grade of D are approaching expectations they have not actually met the expectations.
Other school systems eschew the terminology of failure and have dispensed with the F grade in favor of a grade E. The argument is that all students have displayed varying degrees or levels of competence and should be evaluated according to the level of proficiency demonstrated. Some degrees of competence are worthy of credit and others are not. So the descriptor for E would be indicative that the student has not demonstrated even minimal expectations. This is a performance based descriptor as it only refers to the skills the student has demonstrated but makes no inference about the students’ achievement or potential. Unfortunately the word Fail has become connotative beyond simply meaning a failure to demonstrate a specific set of skills.
Summary
Applying the Narrative Descriptors
In Term 1 Michael, with grades F-A-A-A, failed to meet even minimal expectations. In terms 2 through 4 Michael was consistently outstanding, significantly exceeding expectations in each term.
Would it be appropriate to describe Michael’s performance and achievement in the class as (a) Outstanding, (b) Exceeding expectations, (c) Meeting expectations, (d) Approaching expectations, or (e) Failing to meet expectations?
I would argue conservatively that Michael exceeds expectations (b) and should earn a grade of B. However, in any standards based system of grading I would object to the use of a non-zero minimal score but I would advocate for the use of a 4-point scale based on the letter grades earned and use the average to determine the final grade, method (5) referenced above. The four point scale approach is a much better, more defensible approach to grading.
Postscript
Even though each letter grade corresponds to a discrete integral numerical value when the grade points are average the answer come out as a continuous variable. How should those values then be treated? If we use simple standard rounding to the nearest whole number then a student with grades F-F-D-D would end up with a value of 0.5 which would round to 1 or a D. As a teacher, or as a student, parent or potential employer would you be comfortable with a students with grades F-F-D-D passing a class and earning credit?
Clearly there is room for further discussion.